Re: [RFT][PATCH v7.3 5/8] cpuidle: Return nohz hint from cpuidle_select()
However my fundamental concerns about the policy whether to disable the sched tick remain: Mixing the precise timer and vague heuristic for the decision is dangerous. The timer should not be wrong, heuristic may be. Well, I wouldn't say "dangerous". It may be suboptimal, but even that is not a given. Besides -> Decisions should use actual time points rather than the generic tick duration and residency time. e.g. expected_interval < delta_next_us vs expected_interval < TICK_USEC -> the role of this check is to justify taking the overhead of stopping the tick and it certainly is justifiable if the governor doesn't anticipate any wakeups (timer or not) in the TICK_USEC range. It may be justifiable in other cases too, but that's a matter of some more complex checks and may not be worth the extra complexity at all. I tried that change. It's just just a bit bulky because I cache the result of ktime_to_us(delta_next) early. diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index a6eca02..cad87bf 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -296,6 +296,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, unsigned long nr_iowaiters, cpu_load; int resume_latency = dev_pm_qos_raw_read_value(device); ktime_t delta_next; + unsigned long delta_next_us; if (data->needs_update) { menu_update(drv, dev); @@ -314,6 +315,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, /* determine the expected residency time, round up */ data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(_next)); + delta_next_us = ktime_to_us(delta_next); get_iowait_load(_iowaiters, _load); data->bucket = which_bucket(data->next_timer_us, nr_iowaiters); @@ -364,7 +366,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, */ if (data->predicted_us < TICK_USEC) data->predicted_us = min_t(unsigned int, TICK_USEC, - ktime_to_us(delta_next)); + delta_next_us); } else { /* * Use the performance multiplier and the user-configurable @@ -412,9 +414,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * expected idle duration is shorter than the tick period length. */ if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) || - expected_interval < TICK_USEC) { - unsigned int delta_next_us = ktime_to_us(delta_next); - + expected_interval < delta_next_us) { *stop_tick = false; if (!tick_nohz_tick_stopped() && idx > 0 && This works as a I expect in the sense of stopping the tick more often and allowing deeper sleep states in some cases. However, it drastically *increases* the power consumption for some affected workloads test system (SKL-SP). So while I believe this generally improves the behavior - I can't recommend it based on the practical implications. Below are some details for the curious: power consumption for various workload configurations with 250 Hz kernels 4.16.0, v9, v9+delta_next patch: https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v9_250_Hz_power.png Practically v9 has the best power consumption in most cases. The following detailed looks are with 1000 Hz kernels. v9 with a synchronized 950 us sleep workload: https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v9_poll_sync.png v9 with a staggered 950 us sleep workload: https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v9_poll_stagger.png Both show that the sched tick is kept on and this causes more requests to C1E than C6 v9+delta_next with a synchronized 950 us sleep workload: https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v9_delta_poll_sync.png v9+delta_next with a staggered 950 us sleep workload: https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v9_delta_poll_stagger.png No more sched tick, no more C1E requests, but increased power. Besides: - the hardware reports 0 residency in C6 (both core and PKG) for both v9 and v9+delta_next_us. - the increased power consumption comes after a ramp-up of ~200 ms for the staggered and ~2 s for the synchronized workload. For reference traces from an unmodified 4.16.0: https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v4.16.0_poll_sync.png https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v4.16.0_poll_stagger.png It behaves similar to the delta_next patch but does not show the increased power consumption in this exact workload configuration. I couldn't help to dig into the effect a bit more and am able to reproduce it even under unmodified kernels with staggered sleep
Re: [RFT][PATCH v7.3 5/8] cpuidle: Return nohz hint from cpuidle_select()
However my fundamental concerns about the policy whether to disable the sched tick remain: Mixing the precise timer and vague heuristic for the decision is dangerous. The timer should not be wrong, heuristic may be. Well, I wouldn't say "dangerous". It may be suboptimal, but even that is not a given. Besides -> Decisions should use actual time points rather than the generic tick duration and residency time. e.g. expected_interval < delta_next_us vs expected_interval < TICK_USEC -> the role of this check is to justify taking the overhead of stopping the tick and it certainly is justifiable if the governor doesn't anticipate any wakeups (timer or not) in the TICK_USEC range. It may be justifiable in other cases too, but that's a matter of some more complex checks and may not be worth the extra complexity at all. I tried that change. It's just just a bit bulky because I cache the result of ktime_to_us(delta_next) early. diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index a6eca02..cad87bf 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -296,6 +296,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, unsigned long nr_iowaiters, cpu_load; int resume_latency = dev_pm_qos_raw_read_value(device); ktime_t delta_next; + unsigned long delta_next_us; if (data->needs_update) { menu_update(drv, dev); @@ -314,6 +315,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, /* determine the expected residency time, round up */ data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(_next)); + delta_next_us = ktime_to_us(delta_next); get_iowait_load(_iowaiters, _load); data->bucket = which_bucket(data->next_timer_us, nr_iowaiters); @@ -364,7 +366,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, */ if (data->predicted_us < TICK_USEC) data->predicted_us = min_t(unsigned int, TICK_USEC, - ktime_to_us(delta_next)); + delta_next_us); } else { /* * Use the performance multiplier and the user-configurable @@ -412,9 +414,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * expected idle duration is shorter than the tick period length. */ if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) || - expected_interval < TICK_USEC) { - unsigned int delta_next_us = ktime_to_us(delta_next); - + expected_interval < delta_next_us) { *stop_tick = false; if (!tick_nohz_tick_stopped() && idx > 0 && This works as a I expect in the sense of stopping the tick more often and allowing deeper sleep states in some cases. However, it drastically *increases* the power consumption for some affected workloads test system (SKL-SP). So while I believe this generally improves the behavior - I can't recommend it based on the practical implications. Below are some details for the curious: power consumption for various workload configurations with 250 Hz kernels 4.16.0, v9, v9+delta_next patch: https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v9_250_Hz_power.png Practically v9 has the best power consumption in most cases. The following detailed looks are with 1000 Hz kernels. v9 with a synchronized 950 us sleep workload: https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v9_poll_sync.png v9 with a staggered 950 us sleep workload: https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v9_poll_stagger.png Both show that the sched tick is kept on and this causes more requests to C1E than C6 v9+delta_next with a synchronized 950 us sleep workload: https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v9_delta_poll_sync.png v9+delta_next with a staggered 950 us sleep workload: https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v9_delta_poll_stagger.png No more sched tick, no more C1E requests, but increased power. Besides: - the hardware reports 0 residency in C6 (both core and PKG) for both v9 and v9+delta_next_us. - the increased power consumption comes after a ramp-up of ~200 ms for the staggered and ~2 s for the synchronized workload. For reference traces from an unmodified 4.16.0: https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v4.16.0_poll_sync.png https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v4.16.0_poll_stagger.png It behaves similar to the delta_next patch but does not show the increased power consumption in this exact workload configuration. I couldn't help to dig into the effect a bit more and am able to reproduce it even under unmodified kernels with staggered sleep
Re: [PATCH v9 00/10] sched/cpuidle: Idle loop rework
On 2018-04-08 18:32, Rafael J. Wysocki wrote: The v9 along with some cleanups suggested by Frederic on top of it and with ACKs from Peter (obtained on IRC) is now available from the pm-cpuidle branch in the linux-pm.git tree. It has been added to my linux-next branch, so it probably will be picked up by linux-next tomorrow and I have a plan to push it for v4.17 in the second half of the next week unless a major issue with it is found in the meantime. Great to hear that. Thanks for all your work. I'm finishing up some analysis of corner cases, but nothing major. So I'm glad to see this is moving along. I've been nitpicking a lot, but this is clearly a huge improvement and there are practical limitations against a theoretically perfect solution. In any case the changes will also make future policy adaptions much easier. Thanks, Thomas smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH v9 00/10] sched/cpuidle: Idle loop rework
On 2018-04-08 18:32, Rafael J. Wysocki wrote: The v9 along with some cleanups suggested by Frederic on top of it and with ACKs from Peter (obtained on IRC) is now available from the pm-cpuidle branch in the linux-pm.git tree. It has been added to my linux-next branch, so it probably will be picked up by linux-next tomorrow and I have a plan to push it for v4.17 in the second half of the next week unless a major issue with it is found in the meantime. Great to hear that. Thanks for all your work. I'm finishing up some analysis of corner cases, but nothing major. So I'm glad to see this is moving along. I've been nitpicking a lot, but this is clearly a huge improvement and there are practical limitations against a theoretically perfect solution. In any case the changes will also make future policy adaptions much easier. Thanks, Thomas smime.p7s Description: S/MIME Cryptographic Signature
Re: [RFT][PATCH v7 6/8] sched: idle: Select idle state before stopping the tick
On 2018-03-28 12:56, Rafael J. Wysocki wrote: On Wed, Mar 28, 2018 at 12:37 PM, Rafael J. Wysocki <raf...@kernel.org> wrote: On Wed, Mar 28, 2018 at 10:38 AM, Thomas Ilsche <thomas.ils...@tu-dresden.de> wrote: On 2018-03-28 10:13, Rafael J. Wysocki wrote: [cut] So I do $ for cpu in 0 1 2 3; do taskset -c $cpu sh -c 'while true; do usleep 500; done' & done which is a shell kind of imitation of the above and I cannot see this issue at all. I count the number of times data->next_timer_us in menu_select() is greater than TICK_USEC and while this "workload" is running, that number is exactly 0. I'll try with a C program still. And with a C program I see data->next_timer_us greater than TICK_USEC while it is running, so let me dig deeper. I can confirm that a shell-loop behaves differently like you describe. Even if it's just a shell-loop calling "main{usleep(500);}" binary. smime.p7s Description: S/MIME Cryptographic Signature
Re: [RFT][PATCH v7 6/8] sched: idle: Select idle state before stopping the tick
On 2018-03-28 12:56, Rafael J. Wysocki wrote: On Wed, Mar 28, 2018 at 12:37 PM, Rafael J. Wysocki wrote: On Wed, Mar 28, 2018 at 10:38 AM, Thomas Ilsche wrote: On 2018-03-28 10:13, Rafael J. Wysocki wrote: [cut] So I do $ for cpu in 0 1 2 3; do taskset -c $cpu sh -c 'while true; do usleep 500; done' & done which is a shell kind of imitation of the above and I cannot see this issue at all. I count the number of times data->next_timer_us in menu_select() is greater than TICK_USEC and while this "workload" is running, that number is exactly 0. I'll try with a C program still. And with a C program I see data->next_timer_us greater than TICK_USEC while it is running, so let me dig deeper. I can confirm that a shell-loop behaves differently like you describe. Even if it's just a shell-loop calling "main{usleep(500);}" binary. smime.p7s Description: S/MIME Cryptographic Signature
Re: [RFT][PATCH v7.3 5/8] cpuidle: Return nohz hint from cpuidle_select()
On 2018-03-22 18:40, Rafael J. Wysocki wrote: From: Rafael J. WysockiAdd a new pointer argument to cpuidle_select() and to the ->select cpuidle governor callback to allow a boolean value indicating whether or not the tick should be stopped before entering the selected state to be returned from there. Make the ladder governor ignore that pointer (to preserve its current behavior) and make the menu governor return 'false" through it if: (1) the idle exit latency is constrained at 0, or (2) the selected state is a polling one, or (3) the expected idle period duration is within the tick period range. In addition to that, the correction factor computations in the menu governor need to take the possibility that the tick may not be stopped into account to avoid artificially small correction factor values. To that end, add a mechanism to record tick wakeups, as suggested by Peter Zijlstra, and use it to modify the menu_update() behavior when tick wakeup occurs. Namely, if the CPU is woken up by the tick and the return value of tick_nohz_get_sleep_length() is not within the tick boundary, the predicted idle duration is likely too short, so make menu_update() try to compensate for that by updating the governor statistics as though the CPU was idle for a long time. Since the value returned through the new argument pointer of cpuidle_select() is not used by its caller yet, this change by itself is not expected to alter the functionality of the code. Signed-off-by: Rafael J. Wysocki --- One more revision here. From the Thomas Ilsche's testing on the Skylake server it looks like data->intervals[] need to be updated along with the correction factor on tick wakeups that occur when next_timer_us is above the tick boundary. The difference between this and the original v7 (of patch [5/8]) is what happens in menu_update(). This time next_timer_us is checked properly and if that is above the tick boundary and a tick wakeup occurs, the function simply sets mesured_us to a large constant and uses that to update both the correction factor and data->intervals[] (the particular value used in this patch was found through a bit of experimentation). Let's see how this works for Thomas and Doug. For easier testing there is a git branch containing this patch (and the rest of the series) at: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \ idle-loop-v7.3 Thanks! Besides the other issue with tick_nohz_get_sleep_length, v7.3 generally works well in idle. So far I don't see anything statistically noticeable, but I saw one peculiar anomaly. After all cores woke up simultaneously to schedule some kworker task, some of them kept the sched tick up, even stayed in shallow sleep state for a while, without having any tasks scheduled. Gradually they chose deeper sleep states and stopped their sched ticks. After 23 ms (1000 Hz kernel), they all went back to deep sleep. https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_3_skl_sp_anomaly.png I have only seen this once so far and can't reproduce it yet, so this particular instance may not be an issue in practice. However my fundamental concerns about the policy whether to disable the sched tick remain: Mixing the precise timer and vague heuristic for the decision is dangerous. The timer should not be wrong, heuristic may be. Decisions should use actual time points rather than the generic tick duration and residency time. e.g. expected_interval < delta_next_us vs expected_interval < TICK_USEC For some cases the unmodified sched tick is not efficient as fallback. Is it feasible to 1) enable the sched tick when it's currently disabled instead of blindly choosing a different C state? 2) modify the next upcoming sched tick to be better suitable as fallback timer? I think with the infrastructure changes it should be possible to implement the policy I envisioned previously (https://marc.info/?l=linux-pm=151384941425947=2), which is based on the ordering of timers and the heuristically predicted idle time. If the sleep_length issue is fixed and I have some mechanism for a modifiable fallback timer, I'll try to demonstrate that on top of your changes. --- drivers/cpuidle/cpuidle.c | 10 +- drivers/cpuidle/governors/ladder.c |3 + drivers/cpuidle/governors/menu.c | 59 + include/linux/cpuidle.h|8 +++-- include/linux/tick.h |2 + kernel/sched/idle.c|4 +- kernel/time/tick-sched.c | 20 7 files changed, 87 insertions(+), 19 deletions(-) Index: linux-pm/include/linux/cpuidle.h === --- linux-pm.orig/include/linux/cpuidle.h +++ linux-pm/include/linux/cpuidle.h @@ -135,7 +135,8 @@ extern bool cpuidle_not_available(struct struct
Re: [RFT][PATCH v7.3 5/8] cpuidle: Return nohz hint from cpuidle_select()
On 2018-03-22 18:40, Rafael J. Wysocki wrote: From: Rafael J. Wysocki Add a new pointer argument to cpuidle_select() and to the ->select cpuidle governor callback to allow a boolean value indicating whether or not the tick should be stopped before entering the selected state to be returned from there. Make the ladder governor ignore that pointer (to preserve its current behavior) and make the menu governor return 'false" through it if: (1) the idle exit latency is constrained at 0, or (2) the selected state is a polling one, or (3) the expected idle period duration is within the tick period range. In addition to that, the correction factor computations in the menu governor need to take the possibility that the tick may not be stopped into account to avoid artificially small correction factor values. To that end, add a mechanism to record tick wakeups, as suggested by Peter Zijlstra, and use it to modify the menu_update() behavior when tick wakeup occurs. Namely, if the CPU is woken up by the tick and the return value of tick_nohz_get_sleep_length() is not within the tick boundary, the predicted idle duration is likely too short, so make menu_update() try to compensate for that by updating the governor statistics as though the CPU was idle for a long time. Since the value returned through the new argument pointer of cpuidle_select() is not used by its caller yet, this change by itself is not expected to alter the functionality of the code. Signed-off-by: Rafael J. Wysocki --- One more revision here. From the Thomas Ilsche's testing on the Skylake server it looks like data->intervals[] need to be updated along with the correction factor on tick wakeups that occur when next_timer_us is above the tick boundary. The difference between this and the original v7 (of patch [5/8]) is what happens in menu_update(). This time next_timer_us is checked properly and if that is above the tick boundary and a tick wakeup occurs, the function simply sets mesured_us to a large constant and uses that to update both the correction factor and data->intervals[] (the particular value used in this patch was found through a bit of experimentation). Let's see how this works for Thomas and Doug. For easier testing there is a git branch containing this patch (and the rest of the series) at: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \ idle-loop-v7.3 Thanks! Besides the other issue with tick_nohz_get_sleep_length, v7.3 generally works well in idle. So far I don't see anything statistically noticeable, but I saw one peculiar anomaly. After all cores woke up simultaneously to schedule some kworker task, some of them kept the sched tick up, even stayed in shallow sleep state for a while, without having any tasks scheduled. Gradually they chose deeper sleep states and stopped their sched ticks. After 23 ms (1000 Hz kernel), they all went back to deep sleep. https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_3_skl_sp_anomaly.png I have only seen this once so far and can't reproduce it yet, so this particular instance may not be an issue in practice. However my fundamental concerns about the policy whether to disable the sched tick remain: Mixing the precise timer and vague heuristic for the decision is dangerous. The timer should not be wrong, heuristic may be. Decisions should use actual time points rather than the generic tick duration and residency time. e.g. expected_interval < delta_next_us vs expected_interval < TICK_USEC For some cases the unmodified sched tick is not efficient as fallback. Is it feasible to 1) enable the sched tick when it's currently disabled instead of blindly choosing a different C state? 2) modify the next upcoming sched tick to be better suitable as fallback timer? I think with the infrastructure changes it should be possible to implement the policy I envisioned previously (https://marc.info/?l=linux-pm=151384941425947=2), which is based on the ordering of timers and the heuristically predicted idle time. If the sleep_length issue is fixed and I have some mechanism for a modifiable fallback timer, I'll try to demonstrate that on top of your changes. --- drivers/cpuidle/cpuidle.c | 10 +- drivers/cpuidle/governors/ladder.c |3 + drivers/cpuidle/governors/menu.c | 59 + include/linux/cpuidle.h|8 +++-- include/linux/tick.h |2 + kernel/sched/idle.c|4 +- kernel/time/tick-sched.c | 20 7 files changed, 87 insertions(+), 19 deletions(-) Index: linux-pm/include/linux/cpuidle.h === --- linux-pm.orig/include/linux/cpuidle.h +++ linux-pm/include/linux/cpuidle.h @@ -135,7 +135,8 @@ extern bool cpuidle_not_available(struct struct cpuidle_device *dev); extern int cpuidle_select(struct
Re: [RFT][PATCH v7 6/8] sched: idle: Select idle state before stopping the tick
On 2018-03-28 10:13, Rafael J. Wysocki wrote: On Wed, Mar 28, 2018 at 12:10 AM, Rafael J. Wysocki <r...@rjwysocki.net> wrote: On Tuesday, March 27, 2018 11:50:02 PM CEST Thomas Ilsche wrote: On 2018-03-20 16:45, Rafael J. Wysocki wrote: From: Rafael J. Wysocki <rafael.j.wyso...@intel.com> In order to address the issue with short idle duration predictions by the idle governor after the tick has been stopped, reorder the code in cpuidle_idle_call() so that the governor idle state selection runs before tick_nohz_idle_go_idle() and use the "nohz" hint returned by cpuidle_select() to decide whether or not to stop the tick. This isn't straightforward, because menu_select() invokes tick_nohz_get_sleep_length() to get the time to the next timer event and the number returned by the latter comes from __tick_nohz_idle_enter(). Fortunately, however, it is possible to compute that number without actually stopping the tick and with the help of the existing code. I think something is wrong with the new tick_nohz_get_sleep_length. It seems to return a value that is too large, ignoring immanent non-sched timer. That's a very useful hint, let me have a look. I tested idle-loop-v7.3. It looks very similar to my previous results on the first idle-loop-git-version [1]. Idle and traditional synthetic powernightmares are mostly good. OK But it selects too deep C-states for short idle periods, which is bad for power consumption [2]. That still needs to be improved, then. I tracked this down with additional tests using __attribute__((optimize("O0"))) menu_select and perf probe. With this the behavior seems slightly different, but it shows that data->next_timer_us is: v4.16-rc6: the expected ~500 us [3] idle-loop-v7.3: many milliseconds to minutes [4]. This leads to the governor to wrongly selecting C6. Checking with 372be9e and 6ea0577, I can confirm that the change is introduced by this patch. Yes, that's where the most intrusive reordering happens. Overall, this is an interesting conundrum, because the case in question is when the tick should never be stopped at all during the workload and the code's behavior in that case should not change, so the change was not intentional. Now, from walking through the code, as long as can_stop_idle_tick() returns 'true' all should be fine or at least I don't see why there is any difference in behavior in that case. However, if can_stop_idle_tick() returns 'false' (for example, because need_resched() returns 'true' when it is evaluated), the behavior *is* different in a couple of ways. I sort of know how that can be addressed, but I'd like to reproduce your results here. Are you still using the same workload as before to trigger this behavior? Yes, the exact code I use is as follows $ gcc poller.c -O3 -fopenmp -o poller_omp $ GOMP_CPU_AFFINITY=0-35 ./poller_omp 500 #include #include #include int main(int argc, char *argv[]) { int sleep_us = 1; if (argc == 2) { sleep_us = atoi(argv[1]); } #pragma omp parallel { while (1) { usleep(sleep_us); } } }
Re: [RFT][PATCH v7 6/8] sched: idle: Select idle state before stopping the tick
On 2018-03-28 10:13, Rafael J. Wysocki wrote: On Wed, Mar 28, 2018 at 12:10 AM, Rafael J. Wysocki wrote: On Tuesday, March 27, 2018 11:50:02 PM CEST Thomas Ilsche wrote: On 2018-03-20 16:45, Rafael J. Wysocki wrote: From: Rafael J. Wysocki In order to address the issue with short idle duration predictions by the idle governor after the tick has been stopped, reorder the code in cpuidle_idle_call() so that the governor idle state selection runs before tick_nohz_idle_go_idle() and use the "nohz" hint returned by cpuidle_select() to decide whether or not to stop the tick. This isn't straightforward, because menu_select() invokes tick_nohz_get_sleep_length() to get the time to the next timer event and the number returned by the latter comes from __tick_nohz_idle_enter(). Fortunately, however, it is possible to compute that number without actually stopping the tick and with the help of the existing code. I think something is wrong with the new tick_nohz_get_sleep_length. It seems to return a value that is too large, ignoring immanent non-sched timer. That's a very useful hint, let me have a look. I tested idle-loop-v7.3. It looks very similar to my previous results on the first idle-loop-git-version [1]. Idle and traditional synthetic powernightmares are mostly good. OK But it selects too deep C-states for short idle periods, which is bad for power consumption [2]. That still needs to be improved, then. I tracked this down with additional tests using __attribute__((optimize("O0"))) menu_select and perf probe. With this the behavior seems slightly different, but it shows that data->next_timer_us is: v4.16-rc6: the expected ~500 us [3] idle-loop-v7.3: many milliseconds to minutes [4]. This leads to the governor to wrongly selecting C6. Checking with 372be9e and 6ea0577, I can confirm that the change is introduced by this patch. Yes, that's where the most intrusive reordering happens. Overall, this is an interesting conundrum, because the case in question is when the tick should never be stopped at all during the workload and the code's behavior in that case should not change, so the change was not intentional. Now, from walking through the code, as long as can_stop_idle_tick() returns 'true' all should be fine or at least I don't see why there is any difference in behavior in that case. However, if can_stop_idle_tick() returns 'false' (for example, because need_resched() returns 'true' when it is evaluated), the behavior *is* different in a couple of ways. I sort of know how that can be addressed, but I'd like to reproduce your results here. Are you still using the same workload as before to trigger this behavior? Yes, the exact code I use is as follows $ gcc poller.c -O3 -fopenmp -o poller_omp $ GOMP_CPU_AFFINITY=0-35 ./poller_omp 500 #include #include #include int main(int argc, char *argv[]) { int sleep_us = 1; if (argc == 2) { sleep_us = atoi(argv[1]); } #pragma omp parallel { while (1) { usleep(sleep_us); } } }
Re: [RFT][PATCH v7 6/8] sched: idle: Select idle state before stopping the tick
On 2018-03-20 16:45, Rafael J. Wysocki wrote: From: Rafael J. WysockiIn order to address the issue with short idle duration predictions by the idle governor after the tick has been stopped, reorder the code in cpuidle_idle_call() so that the governor idle state selection runs before tick_nohz_idle_go_idle() and use the "nohz" hint returned by cpuidle_select() to decide whether or not to stop the tick. This isn't straightforward, because menu_select() invokes tick_nohz_get_sleep_length() to get the time to the next timer event and the number returned by the latter comes from __tick_nohz_idle_enter(). Fortunately, however, it is possible to compute that number without actually stopping the tick and with the help of the existing code. I think something is wrong with the new tick_nohz_get_sleep_length. It seems to return a value that is too large, ignoring immanent non-sched timer. I tested idle-loop-v7.3. It looks very similar to my previous results on the first idle-loop-git-version [1]. Idle and traditional synthetic powernightmares are mostly good. But it selects too deep C-states for short idle periods, which is bad for power consumption [2]. I tracked this down with additional tests using __attribute__((optimize("O0"))) menu_select and perf probe. With this the behavior seems slightly different, but it shows that data->next_timer_us is: v4.16-rc6: the expected ~500 us [3] idle-loop-v7.3: many milliseconds to minutes [4]. This leads to the governor to wrongly selecting C6. Checking with 372be9e and 6ea0577, I can confirm that the change is introduced by this patch. [1] https://lkml.org/lkml/2018/3/20/238 [2] https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_3_skl_sp.png [3] https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/next_timer_us-v4.16-rc6.png [4] https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/next_timer_us-idle-loop-v7.3.png Namely, notice that tick_nohz_stop_sched_tick() already computes the next timer event time to reprogram the scheduler tick hrtimer and that time can be used as a proxy for the actual next timer event time in the idle duration predicition. Moreover, it is possible to split tick_nohz_stop_sched_tick() into two separate routines, one computing the time to the next timer event and the other simply stopping the tick when the time to the next timer event is known. Accordingly, split tick_nohz_stop_sched_tick() into tick_nohz_next_event() and tick_nohz_stop_tick() and use the former in tick_nohz_get_sleep_length(). Add two new extra fields, timer_expires and timer_expires_base, to struct tick_sched for passing data between these two new functions and to indicate that tick_nohz_next_event() has run and tick_nohz_stop_tick() can be called now. Also drop the now redundant sleep_length field from there. Signed-off-by: Rafael J. Wysocki --- v5 -> v7: * Rebase on top of the new [5/8]. --- include/linux/tick.h |2 kernel/sched/idle.c | 11 ++- kernel/time/tick-sched.c | 156 +++ kernel/time/tick-sched.h |6 + 4 files changed, 120 insertions(+), 55 deletions(-) Index: linux-pm/kernel/time/tick-sched.h === --- linux-pm.orig/kernel/time/tick-sched.h +++ linux-pm/kernel/time/tick-sched.h @@ -38,7 +38,8 @@ enum tick_nohz_mode { * @idle_exittime:Time when the idle state was left * @idle_sleeptime: Sum of the time slept in idle with sched tick stopped * @iowait_sleeptime: Sum of the time slept in idle with sched tick stopped, with IO outstanding - * @sleep_length: Duration of the current idle sleep + * @timer_expires: Anticipated timer expiration time (in case sched tick is stopped) + * @timer_expires_base:Base time clock monotonic for @timer_expires * @do_timer_lst: CPU was the last one doing do_timer before going idle */ struct tick_sched { @@ -58,8 +59,9 @@ struct tick_sched { ktime_t idle_exittime; ktime_t idle_sleeptime; ktime_t iowait_sleeptime; - ktime_t sleep_length; unsigned long last_jiffies; + u64 timer_expires; + u64 timer_expires_base; u64 next_timer; ktime_t idle_expires; int do_timer_last; Index: linux-pm/kernel/sched/idle.c === --- linux-pm.orig/kernel/sched/idle.c +++ linux-pm/kernel/sched/idle.c @@ -190,13 +190,18 @@ static void cpuidle_idle_call(void) } else { bool stop_tick = true; - tick_nohz_idle_stop_tick(); - rcu_idle_enter(); - /*
Re: [RFT][PATCH v7 6/8] sched: idle: Select idle state before stopping the tick
On 2018-03-20 16:45, Rafael J. Wysocki wrote: From: Rafael J. Wysocki In order to address the issue with short idle duration predictions by the idle governor after the tick has been stopped, reorder the code in cpuidle_idle_call() so that the governor idle state selection runs before tick_nohz_idle_go_idle() and use the "nohz" hint returned by cpuidle_select() to decide whether or not to stop the tick. This isn't straightforward, because menu_select() invokes tick_nohz_get_sleep_length() to get the time to the next timer event and the number returned by the latter comes from __tick_nohz_idle_enter(). Fortunately, however, it is possible to compute that number without actually stopping the tick and with the help of the existing code. I think something is wrong with the new tick_nohz_get_sleep_length. It seems to return a value that is too large, ignoring immanent non-sched timer. I tested idle-loop-v7.3. It looks very similar to my previous results on the first idle-loop-git-version [1]. Idle and traditional synthetic powernightmares are mostly good. But it selects too deep C-states for short idle periods, which is bad for power consumption [2]. I tracked this down with additional tests using __attribute__((optimize("O0"))) menu_select and perf probe. With this the behavior seems slightly different, but it shows that data->next_timer_us is: v4.16-rc6: the expected ~500 us [3] idle-loop-v7.3: many milliseconds to minutes [4]. This leads to the governor to wrongly selecting C6. Checking with 372be9e and 6ea0577, I can confirm that the change is introduced by this patch. [1] https://lkml.org/lkml/2018/3/20/238 [2] https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_3_skl_sp.png [3] https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/next_timer_us-v4.16-rc6.png [4] https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/next_timer_us-idle-loop-v7.3.png Namely, notice that tick_nohz_stop_sched_tick() already computes the next timer event time to reprogram the scheduler tick hrtimer and that time can be used as a proxy for the actual next timer event time in the idle duration predicition. Moreover, it is possible to split tick_nohz_stop_sched_tick() into two separate routines, one computing the time to the next timer event and the other simply stopping the tick when the time to the next timer event is known. Accordingly, split tick_nohz_stop_sched_tick() into tick_nohz_next_event() and tick_nohz_stop_tick() and use the former in tick_nohz_get_sleep_length(). Add two new extra fields, timer_expires and timer_expires_base, to struct tick_sched for passing data between these two new functions and to indicate that tick_nohz_next_event() has run and tick_nohz_stop_tick() can be called now. Also drop the now redundant sleep_length field from there. Signed-off-by: Rafael J. Wysocki --- v5 -> v7: * Rebase on top of the new [5/8]. --- include/linux/tick.h |2 kernel/sched/idle.c | 11 ++- kernel/time/tick-sched.c | 156 +++ kernel/time/tick-sched.h |6 + 4 files changed, 120 insertions(+), 55 deletions(-) Index: linux-pm/kernel/time/tick-sched.h === --- linux-pm.orig/kernel/time/tick-sched.h +++ linux-pm/kernel/time/tick-sched.h @@ -38,7 +38,8 @@ enum tick_nohz_mode { * @idle_exittime:Time when the idle state was left * @idle_sleeptime: Sum of the time slept in idle with sched tick stopped * @iowait_sleeptime: Sum of the time slept in idle with sched tick stopped, with IO outstanding - * @sleep_length: Duration of the current idle sleep + * @timer_expires: Anticipated timer expiration time (in case sched tick is stopped) + * @timer_expires_base:Base time clock monotonic for @timer_expires * @do_timer_lst: CPU was the last one doing do_timer before going idle */ struct tick_sched { @@ -58,8 +59,9 @@ struct tick_sched { ktime_t idle_exittime; ktime_t idle_sleeptime; ktime_t iowait_sleeptime; - ktime_t sleep_length; unsigned long last_jiffies; + u64 timer_expires; + u64 timer_expires_base; u64 next_timer; ktime_t idle_expires; int do_timer_last; Index: linux-pm/kernel/sched/idle.c === --- linux-pm.orig/kernel/sched/idle.c +++ linux-pm/kernel/sched/idle.c @@ -190,13 +190,18 @@ static void cpuidle_idle_call(void) } else { bool stop_tick = true; - tick_nohz_idle_stop_tick(); - rcu_idle_enter(); - /* * Ask the cpuidle framework to choose a convenient
Re: [RFT][PATCH v7 5/8] cpuidle: Return nohz hint from cpuidle_select()
On 2018-03-21 23:15, Rafael J. Wysocki wrote: On Wed, Mar 21, 2018 at 6:59 PM, Thomas Ilsche <thomas.ils...@tu-dresden.de> wrote: On 2018-03-21 15:36, Rafael J. Wysocki wrote: So please disregard this one entirely and take the v7.2 replacement instead of it:https://patchwork.kernel.org/patch/10299429/ The current versions (including the above) is in the git branch at git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \ idle-loop-v7.2 With v7.2 (tested on SKL-SP from git) I see similar behavior in idle as with v5: several cores which just keep the sched tick enabled. Worse yet, some go only in C1 (not even C1E!?) despite sleeping the full sched tick. The resulting power consumption is ~105 W instead of ~ 70 W. https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_2_skl_sp_idle.png I have briefly ran v7 and I believe it was also affected. Then it looks like menu_select() stubbornly thinks that the idle duration will be within the tick boundary on those cores. That may be because the bumping up of the correction factor in menu_reflect() is too conservative or it may be necessary to do something radical to measured_us in menu_update() in case of a tick wakeup combined with a large next_timer_us value. For starters, please see if the attached patch (on top of the idle-loop-v7.2 git branch) changes this behavior in any way. The patch on top of idle-loop-v7.2 doesn't improve idle behavior on SKL-SP. Overall it is pretty erratic, I have not seen any regular patterns. Sometimes only few cpus are affected, here's a screenshot of almost all cpus being affected after a short burst workload. https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_2_reflect_skl_sp_idle.png
Re: [RFT][PATCH v7 5/8] cpuidle: Return nohz hint from cpuidle_select()
On 2018-03-21 23:15, Rafael J. Wysocki wrote: On Wed, Mar 21, 2018 at 6:59 PM, Thomas Ilsche wrote: On 2018-03-21 15:36, Rafael J. Wysocki wrote: So please disregard this one entirely and take the v7.2 replacement instead of it:https://patchwork.kernel.org/patch/10299429/ The current versions (including the above) is in the git branch at git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \ idle-loop-v7.2 With v7.2 (tested on SKL-SP from git) I see similar behavior in idle as with v5: several cores which just keep the sched tick enabled. Worse yet, some go only in C1 (not even C1E!?) despite sleeping the full sched tick. The resulting power consumption is ~105 W instead of ~ 70 W. https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_2_skl_sp_idle.png I have briefly ran v7 and I believe it was also affected. Then it looks like menu_select() stubbornly thinks that the idle duration will be within the tick boundary on those cores. That may be because the bumping up of the correction factor in menu_reflect() is too conservative or it may be necessary to do something radical to measured_us in menu_update() in case of a tick wakeup combined with a large next_timer_us value. For starters, please see if the attached patch (on top of the idle-loop-v7.2 git branch) changes this behavior in any way. The patch on top of idle-loop-v7.2 doesn't improve idle behavior on SKL-SP. Overall it is pretty erratic, I have not seen any regular patterns. Sometimes only few cpus are affected, here's a screenshot of almost all cpus being affected after a short burst workload. https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_2_reflect_skl_sp_idle.png
Re: [RFT][PATCH v7 5/8] cpuidle: Return nohz hint from cpuidle_select()
On 2018-03-21 15:36, Rafael J. Wysocki wrote: So please disregard this one entirely and take the v7.2 replacement instead of it:https://patchwork.kernel.org/patch/10299429/ The current versions (including the above) is in the git branch at git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \ idle-loop-v7.2 With v7.2 (tested on SKL-SP from git) I see similar behavior in idle as with v5: several cores which just keep the sched tick enabled. Worse yet, some go only in C1 (not even C1E!?) despite sleeping the full sched tick. The resulting power consumption is ~105 W instead of ~ 70 W. https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_2_skl_sp_idle.png I have briefly ran v7 and I believe it was also affected.
Re: [RFT][PATCH v7 5/8] cpuidle: Return nohz hint from cpuidle_select()
On 2018-03-21 15:36, Rafael J. Wysocki wrote: So please disregard this one entirely and take the v7.2 replacement instead of it:https://patchwork.kernel.org/patch/10299429/ The current versions (including the above) is in the git branch at git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \ idle-loop-v7.2 With v7.2 (tested on SKL-SP from git) I see similar behavior in idle as with v5: several cores which just keep the sched tick enabled. Worse yet, some go only in C1 (not even C1E!?) despite sleeping the full sched tick. The resulting power consumption is ~105 W instead of ~ 70 W. https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_2_skl_sp_idle.png I have briefly ran v7 and I believe it was also affected.
Re: [RFT][PATCH v5 0/7] sched/cpuidle: Idle loop rework
han the deepest available one with target residency +* within the tick period range. +* +* This allows the tick to be stopped even if the +* predicted idle duration is within the tick period +* range to counter the effect by which the prediction +* may be skewed towards lower values due to the tick +* bias. +*/ + expected_interval = s->target_residency; break; BTW, I guess I need to explain the motivation here more thoroughly, so here it goes. The governor predicts idle duration under the assumption that the tick will be stopped, so if the result of the prediction is within the tick period range and it is not accurate, that needs to be taken into account in the governor's statistics. However, if the tick is allowed to run every time the governor predicts idle duration within the tick period range, the governor will always see that it was "almost right" and the correction factor applied by it to improve the prediction next time will not be sufficient. For this reason, it is better to stop the tick at least sometimes when the governor predicts idle duration within the tick period range and the idea here is to do that when the selected state is the deepest available one with the target residency within the tick period range. This allows the opportunity to save more energy to be seized which balances the extra overhead of stopping the tick. HTH -- Dipl. Inf. Thomas Ilsche Computer Scientist Highly Adaptive Energy-Efficient Computing CRC 912 HAEC: http://tu-dresden.de/sfb912 Technische Universität Dresden Center for Information Services and High Performance Computing (ZIH) 01062 Dresden, Germany Phone: +49 351 463-42168 Fax: +49 351 463-37773 E-Mail: thomas.ils...@tu-dresden.de smime.p7s Description: S/MIME Cryptographic Signature
Re: [RFT][PATCH v5 0/7] sched/cpuidle: Idle loop rework
* within the tick period range. +* +* This allows the tick to be stopped even if the +* predicted idle duration is within the tick period +* range to counter the effect by which the prediction +* may be skewed towards lower values due to the tick +* bias. +*/ + expected_interval = s->target_residency; break; BTW, I guess I need to explain the motivation here more thoroughly, so here it goes. The governor predicts idle duration under the assumption that the tick will be stopped, so if the result of the prediction is within the tick period range and it is not accurate, that needs to be taken into account in the governor's statistics. However, if the tick is allowed to run every time the governor predicts idle duration within the tick period range, the governor will always see that it was "almost right" and the correction factor applied by it to improve the prediction next time will not be sufficient. For this reason, it is better to stop the tick at least sometimes when the governor predicts idle duration within the tick period range and the idea here is to do that when the selected state is the deepest available one with the target residency within the tick period range. This allows the opportunity to save more energy to be seized which balances the extra overhead of stopping the tick. HTH -- Dipl. Inf. Thomas Ilsche Computer Scientist Highly Adaptive Energy-Efficient Computing CRC 912 HAEC: http://tu-dresden.de/sfb912 Technische Universität Dresden Center for Information Services and High Performance Computing (ZIH) 01062 Dresden, Germany Phone: +49 351 463-42168 Fax: +49 351 463-37773 E-Mail: thomas.ils...@tu-dresden.de smime.p7s Description: S/MIME Cryptographic Signature
Re: [RFT][PATCH v5 7/7] cpuidle: menu: Avoid selecting shallow states with stopped tick
On 2018-03-15 23:19, Rafael J. Wysocki wrote: From: Rafael J. WysockiIf the scheduler tick has been stopped already and the governor selects a shallow idle state, the CPU can spend a long time in that state if the selection is based on an inaccurate prediction of idle time. That effect turns out to be noticeable, so it needs to be mitigated. What are some common causes for that situation? How could I trigger this for testing? To that end, modify the menu governor to discard the result of the idle time prediction if the tick is stopped and the predicted idle time is less than the tick period length, unless the tick timer is going to expire soon. This seems dangerous. Using a C-state that is too deep could be problematic for soft latency, caches and overall energy. Would it be viable to re-enable the sched tick to act as a fallback? Generally, would it be feasible to modify the upcoming sched tick timer to be a better time for a fallback wakeup in certain situations? Signed-off-by: Rafael J. Wysocki --- v4 -> v5: * Rebase on top of the new [1-6/7]. * Never use the interactivity factor when the tick is stopped. --- drivers/cpuidle/governors/menu.c | 29 ++--- 1 file changed, 22 insertions(+), 7 deletions(-) Index: linux-pm/drivers/cpuidle/governors/menu.c === --- linux-pm.orig/drivers/cpuidle/governors/menu.c +++ linux-pm/drivers/cpuidle/governors/menu.c @@ -353,13 +353,28 @@ static int menu_select(struct cpuidle_dr */ data->predicted_us = min(data->predicted_us, expected_interval); - /* -* Use the performance multiplier and the user-configurable -* latency_req to determine the maximum exit latency. -*/ - interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load); - if (latency_req > interactivity_req) - latency_req = interactivity_req; + if (tick_nohz_tick_stopped()) { + /* +* If the tick is already stopped, the cost of possible short +* idle duration misprediction is much higher, because the CPU +* may be stuck in a shallow idle state for a long time as a +* result of it. In that case say we might mispredict and try +* to force the CPU into a state for which we would have stopped +* the tick, unless the tick timer is going to expire really +* soon anyway. +*/ + if (data->predicted_us < TICK_USEC_HZ) + data->predicted_us = min_t(unsigned int, TICK_USEC_HZ, + ktime_to_us(tick_time)); This applies to the heuristic (expected_interval) and the (heuristically corrected) next timer. Should this modification be applied only to the expected_interval under the assumption that the next_timer_us * correction is never totally wrong. + } else { + /* +* Use the performance multiplier and the user-configurable +* latency_req to determine the maximum exit latency. +*/ + interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load); + if (latency_req > interactivity_req) + latency_req = interactivity_req; + } expected_interval = data->predicted_us; /*
Re: [RFT][PATCH v5 7/7] cpuidle: menu: Avoid selecting shallow states with stopped tick
On 2018-03-15 23:19, Rafael J. Wysocki wrote: From: Rafael J. Wysocki If the scheduler tick has been stopped already and the governor selects a shallow idle state, the CPU can spend a long time in that state if the selection is based on an inaccurate prediction of idle time. That effect turns out to be noticeable, so it needs to be mitigated. What are some common causes for that situation? How could I trigger this for testing? To that end, modify the menu governor to discard the result of the idle time prediction if the tick is stopped and the predicted idle time is less than the tick period length, unless the tick timer is going to expire soon. This seems dangerous. Using a C-state that is too deep could be problematic for soft latency, caches and overall energy. Would it be viable to re-enable the sched tick to act as a fallback? Generally, would it be feasible to modify the upcoming sched tick timer to be a better time for a fallback wakeup in certain situations? Signed-off-by: Rafael J. Wysocki --- v4 -> v5: * Rebase on top of the new [1-6/7]. * Never use the interactivity factor when the tick is stopped. --- drivers/cpuidle/governors/menu.c | 29 ++--- 1 file changed, 22 insertions(+), 7 deletions(-) Index: linux-pm/drivers/cpuidle/governors/menu.c === --- linux-pm.orig/drivers/cpuidle/governors/menu.c +++ linux-pm/drivers/cpuidle/governors/menu.c @@ -353,13 +353,28 @@ static int menu_select(struct cpuidle_dr */ data->predicted_us = min(data->predicted_us, expected_interval); - /* -* Use the performance multiplier and the user-configurable -* latency_req to determine the maximum exit latency. -*/ - interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load); - if (latency_req > interactivity_req) - latency_req = interactivity_req; + if (tick_nohz_tick_stopped()) { + /* +* If the tick is already stopped, the cost of possible short +* idle duration misprediction is much higher, because the CPU +* may be stuck in a shallow idle state for a long time as a +* result of it. In that case say we might mispredict and try +* to force the CPU into a state for which we would have stopped +* the tick, unless the tick timer is going to expire really +* soon anyway. +*/ + if (data->predicted_us < TICK_USEC_HZ) + data->predicted_us = min_t(unsigned int, TICK_USEC_HZ, + ktime_to_us(tick_time)); This applies to the heuristic (expected_interval) and the (heuristically corrected) next timer. Should this modification be applied only to the expected_interval under the assumption that the next_timer_us * correction is never totally wrong. + } else { + /* +* Use the performance multiplier and the user-configurable +* latency_req to determine the maximum exit latency. +*/ + interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load); + if (latency_req > interactivity_req) + latency_req = interactivity_req; + } expected_interval = data->predicted_us; /*
Re: [RFT][PATCH v5 0/7] sched/cpuidle: Idle loop rework
Over the last week I tested v4+pollv2 and now v5+pollv3. With v5, I observe a particular idle behavior, that I have not seen before with v4. On a dual-socket Skylake system the idle power increases from 74.1 W (system total) to 85.5 W with a 300 HZ build and even to 138.3 W with a 1000 HZ build. A similar Haswell-EP system is also affected. There are phases during which one core will keep switching to the highest C-state, but not disable the sched tick. Every 4th sched tick, a kworker on that core is scheduled shortly. Every wakeup from C6 of a single core will more than double the package power consumption of *both8 sockets for ~500 us resulting in the significantly increased sustained power consumption. This is illustrated in [1]. For a comparison of a "normal" phase (samekernel), see [2]. For a global view of the effect on a 1000 Hz build, see [3]. I have not yet found any particular triggers or the specific interaction between the sched tick and the kworker. I'm not sure how this was introduced in v5. I would guess it could be a feedback loop that I was concerned about initially. I have more findings from v4, but this seems much more impactful. [1] https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/rjwv5_idle_300Hz.png [2] https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/rjwv5_idle_300Hz_ok.png [3] https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/rjwv5_idle_1000Hz.png On 2018-03-15 22:59, Rafael J. Wysocki wrote: Hi All, Thanks a lot for the feedback so far! One more respin after the last batch of comments from Peter and Frederic. The previous summary that still applies: On Sunday, March 4, 2018 11:21:30 PM CET Rafael J. Wysocki wrote: The problem is that if we stop the sched tick in tick_nohz_idle_enter() and then the idle governor predicts short idle duration, we lose regardless of whether or not it is right. If it is right, we've lost already, because we stopped the tick unnecessarily. If it is not right, we'll lose going forward, because the idle state selected by the governor is going to be too shallow and we'll draw too much power (that has been reported recently to actually happen often enough for people to care). This patch series is an attempt to improve the situation and the idea here is to make the decision whether or not to stop the tick deeper in the idle loop and in particular after running the idle state selection in the path where the idle governor is invoked. This way the problem can be avoided, because the idle duration predicted by the idle governor can be used to decide whether or not to stop the tick so that the tick is only stopped if that value is large enough (and, consequently, the idle state selected by the governor is deep enough). The series tires to avoid adding too much new code, rather reorder the existing code and make it more fine-grained. Patch 1 prepares the tick-sched code for the subsequent modifications and it doesn't change the code's functionality (at least not intentionally). Patch 2 starts pushing the tick stopping decision deeper into the idle loop, but that is limited to do_idle() and tick_nohz_irq_exit(). Patch 3 makes cpuidle_idle_call() decide whether or not to stop the tick and sets the stage for the subsequent changes. Patch 4 adds a bool pointer argument to cpuidle_select() and the ->select governor callback allowing them to return a "nohz" hint on whether or not to stop the tick to the caller. It also adds code to decide what value to return as "nohz" to the menu governor. Patch 5 reorders the idle state selection with respect to the stopping of the tick and causes the additional "nohz" hint from cpuidle_select() to be used for deciding whether or not to stop the tick. Patch 6 causes the menu governor to refine the state selection in case the tick is not going to be stopped and the already selected state may not fit before the next tick time. Patch 7 Deals with the situation in which the tick was stopped previously, but the idle governor still predicts short idle. This series is complementary to the poll_idle() patch at https://patchwork.kernel.org/patch/10282237/ Thanks, Rafael -- Dipl. Inf. Thomas Ilsche Computer Scientist Highly Adaptive Energy-Efficient Computing CRC 912 HAEC: http://tu-dresden.de/sfb912 Technische Universität Dresden Center for Information Services and High Performance Computing (ZIH) 01062 Dresden, Germany Phone: +49 351 463-42168 Fax: +49 351 463-37773 E-Mail: thomas.ils...@tu-dresden.de smime.p7s Description: S/MIME Cryptographic Signature
Re: [RFT][PATCH v5 0/7] sched/cpuidle: Idle loop rework
Over the last week I tested v4+pollv2 and now v5+pollv3. With v5, I observe a particular idle behavior, that I have not seen before with v4. On a dual-socket Skylake system the idle power increases from 74.1 W (system total) to 85.5 W with a 300 HZ build and even to 138.3 W with a 1000 HZ build. A similar Haswell-EP system is also affected. There are phases during which one core will keep switching to the highest C-state, but not disable the sched tick. Every 4th sched tick, a kworker on that core is scheduled shortly. Every wakeup from C6 of a single core will more than double the package power consumption of *both8 sockets for ~500 us resulting in the significantly increased sustained power consumption. This is illustrated in [1]. For a comparison of a "normal" phase (samekernel), see [2]. For a global view of the effect on a 1000 Hz build, see [3]. I have not yet found any particular triggers or the specific interaction between the sched tick and the kworker. I'm not sure how this was introduced in v5. I would guess it could be a feedback loop that I was concerned about initially. I have more findings from v4, but this seems much more impactful. [1] https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/rjwv5_idle_300Hz.png [2] https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/rjwv5_idle_300Hz_ok.png [3] https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/rjwv5_idle_1000Hz.png On 2018-03-15 22:59, Rafael J. Wysocki wrote: Hi All, Thanks a lot for the feedback so far! One more respin after the last batch of comments from Peter and Frederic. The previous summary that still applies: On Sunday, March 4, 2018 11:21:30 PM CET Rafael J. Wysocki wrote: The problem is that if we stop the sched tick in tick_nohz_idle_enter() and then the idle governor predicts short idle duration, we lose regardless of whether or not it is right. If it is right, we've lost already, because we stopped the tick unnecessarily. If it is not right, we'll lose going forward, because the idle state selected by the governor is going to be too shallow and we'll draw too much power (that has been reported recently to actually happen often enough for people to care). This patch series is an attempt to improve the situation and the idea here is to make the decision whether or not to stop the tick deeper in the idle loop and in particular after running the idle state selection in the path where the idle governor is invoked. This way the problem can be avoided, because the idle duration predicted by the idle governor can be used to decide whether or not to stop the tick so that the tick is only stopped if that value is large enough (and, consequently, the idle state selected by the governor is deep enough). The series tires to avoid adding too much new code, rather reorder the existing code and make it more fine-grained. Patch 1 prepares the tick-sched code for the subsequent modifications and it doesn't change the code's functionality (at least not intentionally). Patch 2 starts pushing the tick stopping decision deeper into the idle loop, but that is limited to do_idle() and tick_nohz_irq_exit(). Patch 3 makes cpuidle_idle_call() decide whether or not to stop the tick and sets the stage for the subsequent changes. Patch 4 adds a bool pointer argument to cpuidle_select() and the ->select governor callback allowing them to return a "nohz" hint on whether or not to stop the tick to the caller. It also adds code to decide what value to return as "nohz" to the menu governor. Patch 5 reorders the idle state selection with respect to the stopping of the tick and causes the additional "nohz" hint from cpuidle_select() to be used for deciding whether or not to stop the tick. Patch 6 causes the menu governor to refine the state selection in case the tick is not going to be stopped and the already selected state may not fit before the next tick time. Patch 7 Deals with the situation in which the tick was stopped previously, but the idle governor still predicts short idle. This series is complementary to the poll_idle() patch at https://patchwork.kernel.org/patch/10282237/ Thanks, Rafael -- Dipl. Inf. Thomas Ilsche Computer Scientist Highly Adaptive Energy-Efficient Computing CRC 912 HAEC: http://tu-dresden.de/sfb912 Technische Universität Dresden Center for Information Services and High Performance Computing (ZIH) 01062 Dresden, Germany Phone: +49 351 463-42168 Fax: +49 351 463-37773 E-Mail: thomas.ils...@tu-dresden.de smime.p7s Description: S/MIME Cryptographic Signature
Re: [RFC/RFT][PATCH 6/7] sched: idle: Predict idle duration before stopping the tick
On 2018-03-04 23:28, Rafael J. Wysocki wrote: use the expected idle period duration returned by cpuidle_select() to tell tick_nohz_idle_go_idle() whether or not to stop the tick. I assume that at the point of going idle, the actual next scheduling tick may happen anywhere between now and 1/HZ. If there is a mechanism that somehow ensures that the next scheduling tick always happens 1/HZ after going idle, then some of my arguments are invalid. Ideally, the decision whether to disable the sched tick should primarily depend on the order of tree upcoming events: the the sched tick, the next non-sched timer, and the heuristic prediction: https://marc.info/?l=linux-pm=151384941425947=2 If I read the code correctly, there is already logic deep within __tick_nohz_idle_enter that prevents disabling the sched tick when it is scheduled to happen after another timer, which is a good primary condition for not stopping the sched tick. However the newly added condition prevents stopping the sched tick in more cases where it is undesirable. Assume duration_us is slightly less than USEC_PER_SEC / HZ. and next sched tick will happen in 0.1 * USEC_PER_SEC / HZ If the prediction was accurate, the cpu will be woken up way too soon by the not-disabled sched tick. I fear that might even create positive feedback loops on the heuristic, which will take into account the sleep durations for sched tick wakeups in sort of a self fulfilling prophecy: 1) The heuristic predicts to wake up in less than a full sched period, 2) The sched tick is kept enabled 3) The sched tick wakes up the system in less than a full sched period 4) Repeat Even when sleeping for longer than target_residency of the deepest sleep state, you can still improve energy consumption by sleeping longer whenever possible. On the opposite side - undesirable shallow sleeps - the proposed patch will basically always keep the tick enabled if there is a higher sleep state with a target_residency <= 1/HZ. On systems with relatively low target_residencies, such as the ones that I am primarily investigating, this should effectively prevent long shallow sleeps. However, on mobile systems with C10 states > 5 ms the sched tick is not a suitable fallback timer for preventing these issues. Well, maybe the timer itself could be used, but with a larger expiry. So IMHO - the precise timer and vague heuristic should not be mixed - decisions should preferably use actual time points rather than the generic tick duration and residency time. - for some cases the sched tick as is may not be sufficient as fallback Question: Does disabling a timer on a cpu guarantee that this cpu will wake-up or is there a scenario where a timer is deleted or moved externally without the cpu having a chance to change it's idle state?
Re: [RFC/RFT][PATCH 6/7] sched: idle: Predict idle duration before stopping the tick
On 2018-03-04 23:28, Rafael J. Wysocki wrote: use the expected idle period duration returned by cpuidle_select() to tell tick_nohz_idle_go_idle() whether or not to stop the tick. I assume that at the point of going idle, the actual next scheduling tick may happen anywhere between now and 1/HZ. If there is a mechanism that somehow ensures that the next scheduling tick always happens 1/HZ after going idle, then some of my arguments are invalid. Ideally, the decision whether to disable the sched tick should primarily depend on the order of tree upcoming events: the the sched tick, the next non-sched timer, and the heuristic prediction: https://marc.info/?l=linux-pm=151384941425947=2 If I read the code correctly, there is already logic deep within __tick_nohz_idle_enter that prevents disabling the sched tick when it is scheduled to happen after another timer, which is a good primary condition for not stopping the sched tick. However the newly added condition prevents stopping the sched tick in more cases where it is undesirable. Assume duration_us is slightly less than USEC_PER_SEC / HZ. and next sched tick will happen in 0.1 * USEC_PER_SEC / HZ If the prediction was accurate, the cpu will be woken up way too soon by the not-disabled sched tick. I fear that might even create positive feedback loops on the heuristic, which will take into account the sleep durations for sched tick wakeups in sort of a self fulfilling prophecy: 1) The heuristic predicts to wake up in less than a full sched period, 2) The sched tick is kept enabled 3) The sched tick wakes up the system in less than a full sched period 4) Repeat Even when sleeping for longer than target_residency of the deepest sleep state, you can still improve energy consumption by sleeping longer whenever possible. On the opposite side - undesirable shallow sleeps - the proposed patch will basically always keep the tick enabled if there is a higher sleep state with a target_residency <= 1/HZ. On systems with relatively low target_residencies, such as the ones that I am primarily investigating, this should effectively prevent long shallow sleeps. However, on mobile systems with C10 states > 5 ms the sched tick is not a suitable fallback timer for preventing these issues. Well, maybe the timer itself could be used, but with a larger expiry. So IMHO - the precise timer and vague heuristic should not be mixed - decisions should preferably use actual time points rather than the generic tick duration and residency time. - for some cases the sched tick as is may not be sufficient as fallback Question: Does disabling a timer on a cpu guarantee that this cpu will wake-up or is there a scenario where a timer is deleted or moved externally without the cpu having a chance to change it's idle state?
Re: [PATCH] cpuidle: Add "cpuidle.use_deepest" to bypass governor and allow HW to go deep
On 2017-11-09 08:38, Len Brown wrote: From: Len BrownWhile there are several mechanisms (cmdline, sysfs, PM_QOS) to limit cpuidle to shallow idle states, there is no simple mechanism to give the hardware permission to enter the deeptest state permitted by PM_QOS. Here we create the "cpuidle.use_deepest" modparam to provide this capability. "cpuidle.use_deepest=Y" can be set at boot-time, and /sys/module/cpuidle/use_deepest can be modified (with Y/N) at run-time. This is a good option to have and can conveniently prevent idle power consumption issues. But that wouldn't be a reasonable default, would it? I still think there is an inherent need for a heuristic and a corresponding mitigation to avoid staying in a sleep state too long. Best, Thomas
Re: [PATCH] cpuidle: Add "cpuidle.use_deepest" to bypass governor and allow HW to go deep
On 2017-11-09 08:38, Len Brown wrote: From: Len Brown While there are several mechanisms (cmdline, sysfs, PM_QOS) to limit cpuidle to shallow idle states, there is no simple mechanism to give the hardware permission to enter the deeptest state permitted by PM_QOS. Here we create the "cpuidle.use_deepest" modparam to provide this capability. "cpuidle.use_deepest=Y" can be set at boot-time, and /sys/module/cpuidle/use_deepest can be modified (with Y/N) at run-time. This is a good option to have and can conveniently prevent idle power consumption issues. But that wouldn't be a reasonable default, would it? I still think there is an inherent need for a heuristic and a corresponding mitigation to avoid staying in a sleep state too long. Best, Thomas
[tip:perf/core] perf tools: Default to python version 2
Commit-ID: d6a947fb6cdff3a19db93895c746f70b5903a965 Gitweb: http://git.kernel.org/tip/d6a947fb6cdff3a19db93895c746f70b5903a965 Author: Thomas Ilsche AuthorDate: Mon, 4 Aug 2014 15:03:15 +0200 Committer: Arnaldo Carvalho de Melo CommitDate: Tue, 12 Aug 2014 12:03:08 -0300 perf tools: Default to python version 2 According to PEP 394 recommendation [1], it's more portable to use python2 rather than plain python to refer python binary version 2. Since there're distros using python3 by default like Arch, and we don't support python3 (yet), it'd be better using python2 explicitly. But older versions (prior to 2.7) seem not to provide python2 but just python. Given that it's only old version, try python2 first and then fallback to python. It'll ensure that it always points to python 2.x. I tested (compiles and perf script runs) with the combinations: 1) python -> python2.x, python-config -> python2.x-config python2 N/A, python2-config N/A 2) python -> python3.x, python-config -> python3.x-config python2 -> python2.x, python2-config -> python2.x-config 3) python -> python2.x, python-config -> python2.x-config python2 -> python2.x, python2-config -> python2.x-config 4) python -> python2.x, python-config -> python2.x-config python2 -> python2.x, python2-config N/A Based on / replaces the patch 2/2 by Namhyung Kim. [1] https://www.python.org/dev/peps/pep-0394 Based-on-patch-by: Namhyung Kim Signed-off-by: Thomas Ilsche Cc: Ingo Molnar Cc: Jiri Olsa Cc: Namhyung Kim Cc: Paul Mackerras Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/53df8493.6070...@tu-dresden.de Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/config/Makefile | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile index e05d8f9..75d4c23 100644 --- a/tools/perf/config/Makefile +++ b/tools/perf/config/Makefile @@ -121,10 +121,16 @@ ifdef PARSER_DEBUG endif ifndef NO_LIBPYTHON - override PYTHON := \ -$(call get-executable-or-default,PYTHON,python) + # Try different combinations to accommodate systems that only have + # python[2][-config] in weird combinations but always preferring + # python2 and python2-config as per pep-0394. If we catch a + # python[-config] in version 3, the version check will kill it. + PYTHON2 := $(if $(call get-executable,python2),python2,python) + override PYTHON := $(call get-executable-or-default,PYTHON,$(PYTHON2)) + PYTHON2_CONFIG := \ +$(if $(call get-executable,$(PYTHON)-config),$(PYTHON)-config,python-config) override PYTHON_CONFIG := \ -$(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config) +$(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON2_CONFIG)) PYTHON_CONFIG_SQ := $(call shell-sq,$(PYTHON_CONFIG)) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:perf/core] perf tools: Default to python version 2
Commit-ID: d6a947fb6cdff3a19db93895c746f70b5903a965 Gitweb: http://git.kernel.org/tip/d6a947fb6cdff3a19db93895c746f70b5903a965 Author: Thomas Ilsche thomas.ils...@tu-dresden.de AuthorDate: Mon, 4 Aug 2014 15:03:15 +0200 Committer: Arnaldo Carvalho de Melo a...@redhat.com CommitDate: Tue, 12 Aug 2014 12:03:08 -0300 perf tools: Default to python version 2 According to PEP 394 recommendation [1], it's more portable to use python2 rather than plain python to refer python binary version 2. Since there're distros using python3 by default like Arch, and we don't support python3 (yet), it'd be better using python2 explicitly. But older versions (prior to 2.7) seem not to provide python2 but just python. Given that it's only old version, try python2 first and then fallback to python. It'll ensure that it always points to python 2.x. I tested (compiles and perf script runs) with the combinations: 1) python - python2.x, python-config - python2.x-config python2 N/A, python2-config N/A 2) python - python3.x, python-config - python3.x-config python2 - python2.x, python2-config - python2.x-config 3) python - python2.x, python-config - python2.x-config python2 - python2.x, python2-config - python2.x-config 4) python - python2.x, python-config - python2.x-config python2 - python2.x, python2-config N/A Based on / replaces the patch 2/2 by Namhyung Kim. [1] https://www.python.org/dev/peps/pep-0394 Based-on-patch-by: Namhyung Kim namhy...@kernel.org Signed-off-by: Thomas Ilsche thomas.ils...@tu-dresden.de Cc: Ingo Molnar mi...@kernel.org Cc: Jiri Olsa jo...@redhat.com Cc: Namhyung Kim namhy...@kernel.org Cc: Paul Mackerras pau...@samba.org Cc: Peter Zijlstra a.p.zijls...@chello.nl Link: http://lkml.kernel.org/r/53df8493.6070...@tu-dresden.de Signed-off-by: Arnaldo Carvalho de Melo a...@redhat.com --- tools/perf/config/Makefile | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile index e05d8f9..75d4c23 100644 --- a/tools/perf/config/Makefile +++ b/tools/perf/config/Makefile @@ -121,10 +121,16 @@ ifdef PARSER_DEBUG endif ifndef NO_LIBPYTHON - override PYTHON := \ -$(call get-executable-or-default,PYTHON,python) + # Try different combinations to accommodate systems that only have + # python[2][-config] in weird combinations but always preferring + # python2 and python2-config as per pep-0394. If we catch a + # python[-config] in version 3, the version check will kill it. + PYTHON2 := $(if $(call get-executable,python2),python2,python) + override PYTHON := $(call get-executable-or-default,PYTHON,$(PYTHON2)) + PYTHON2_CONFIG := \ +$(if $(call get-executable,$(PYTHON)-config),$(PYTHON)-config,python-config) override PYTHON_CONFIG := \ -$(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config) +$(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON2_CONFIG)) PYTHON_CONFIG_SQ := $(call shell-sq,$(PYTHON_CONFIG)) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 2/2] perf tools: Default to python version 2
On 2014-08-02 15:20, Arnaldo Carvalho de Melo wrote: > Maybe the way python was packaged on f14 is problematic, as you > describe, but the fact is that python support is not working on that > platform after this patch. > > Could you please try to prepare a patch that does as you suggest so that > I can try? Please try the patch below. I tested (compiles and perf script runs) with the combinations: 1) python -> python2.x, python-config -> python2.x-config python2 N/A, python2-config N/A 2) python -> python3.x, python-config -> python3.x-config python2 -> python2.x, python2-config -> python2.x-config 3) python -> python2.x, python-config -> python2.x-config python2 -> python2.x, python2-config -> python2.x-config 4) python -> python2.x, python-config -> python2.x-config python2 -> python2.x, python2-config N/A Based on / replaces the patch 2/2 by Namhyung Kim. Signed-off-by: Thomas Ilsche --- diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile index 9ee2008..0528cd37 100644 --- a/tools/perf/config/Makefile +++ b/tools/perf/config/Makefile @@ -104,10 +104,16 @@ ifdef PARSER_DEBUG endif ifndef NO_LIBPYTHON - override PYTHON := \ -$(call get-executable-or-default,PYTHON,python) + # Try different combinations to accommodate systems that only have + # python[2][-config] in weird combinations but always preferring + # python2 and python2-config as per pep-0394. If we catch a + # python[-config] in version 3, the version check will kill it. + PYTHON2 := $(if $(call get-executable,python2),python2,python) + override PYTHON := $(call get-executable-or-default,PYTHON,$(PYTHON2)) + PYTHON2_CONFIG := \ +$(if $(call get-executable,$(PYTHON)-config),$(PYTHON)-config,python-config) override PYTHON_CONFIG := \ -$(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config) +$(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON2_CONFIG)) PYTHON_CONFIG_SQ := $(call shell-sq,$(PYTHON_CONFIG)) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 2/2] perf tools: Default to python version 2
On 2014-08-02 15:20, Arnaldo Carvalho de Melo wrote: Maybe the way python was packaged on f14 is problematic, as you describe, but the fact is that python support is not working on that platform after this patch. Could you please try to prepare a patch that does as you suggest so that I can try? Please try the patch below. I tested (compiles and perf script runs) with the combinations: 1) python - python2.x, python-config - python2.x-config python2 N/A, python2-config N/A 2) python - python3.x, python-config - python3.x-config python2 - python2.x, python2-config - python2.x-config 3) python - python2.x, python-config - python2.x-config python2 - python2.x, python2-config - python2.x-config 4) python - python2.x, python-config - python2.x-config python2 - python2.x, python2-config N/A Based on / replaces the patch 2/2 by Namhyung Kim. Signed-off-by: Thomas Ilsche thomas.ils...@tu-dresden.de --- diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile index 9ee2008..0528cd37 100644 --- a/tools/perf/config/Makefile +++ b/tools/perf/config/Makefile @@ -104,10 +104,16 @@ ifdef PARSER_DEBUG endif ifndef NO_LIBPYTHON - override PYTHON := \ -$(call get-executable-or-default,PYTHON,python) + # Try different combinations to accommodate systems that only have + # python[2][-config] in weird combinations but always preferring + # python2 and python2-config as per pep-0394. If we catch a + # python[-config] in version 3, the version check will kill it. + PYTHON2 := $(if $(call get-executable,python2),python2,python) + override PYTHON := $(call get-executable-or-default,PYTHON,$(PYTHON2)) + PYTHON2_CONFIG := \ +$(if $(call get-executable,$(PYTHON)-config),$(PYTHON)-config,python-config) override PYTHON_CONFIG := \ -$(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config) +$(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON2_CONFIG)) PYTHON_CONFIG_SQ := $(call shell-sq,$(PYTHON_CONFIG)) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 2/2] perf tools: Default to python version 2
Zitat von Arnaldo Carvalho de Melo : > Em Tue, Jul 29, 2014 at 03:57:20PM +0900, Namhyung Kim escreveu: >> According to PEP 394 recommendation [1], it's more portable to use >> python2 rather than plain python to refer python binary version 2. >> >> Since there're distros using python3 by default like Arch, and we >> don't support python3 (yet), it'd be better using python2 explicitly. >> >> But older versions (prior to 2.7) seem not to provide python2 but just >> python. Given that it's only old version, try python2 first and then >> fallback to python. It'll ensure that it always points to python 2.x. > > It should fallback, right? Yes. > [acme@fedora14 linux]$ ls -la /usr/bin/python2-config > ls: cannot access /usr/bin/python2-config: No such file or directory > [acme@fedora14 linux]$ ls -la /usr/bin/python-config > lrwxrwxrwx. 1 root root 16 Mar 25 09:43 /usr/bin/python-config -> > python2.7-config Appearently this fedora package symlinks python -> python2 but does not so for python2-config. The Makefile looks for either python2 or python as fallback and then assumes that a respective "-config" exists. I think this is a sane assumption. I guess if this needs to be supported, there needs to be another fallback if $(PYTHON)-config is not found to try python-config again. However this may behave weirdly if someone has python2, no python2-devel but python3-devel installed, sets PYTHON=python2 and then still gets a version error. Or we somehow only fallback if no PYTHON is specified (not sure how to implemented that elegantly). Best, Thomas > [acme@fedora14 linux]$ rpm -qf /usr/bin/python-config > python-devel-2.7-8.fc14.1.x86_64 > [acme@fedora14 linux]$ cat /etc/fedora-release > Fedora release 14 (Laughlin) > [acme@fedora14 linux]$ > > [acme@fedora14 linux]$ time make O=/tmp/build/perf -C tools/perf install > make: Entering directory `/home/acme/git/linux/tools/perf' > BUILD: Doing 'make -j4' parallel build > config/Makefile:126: The path '/usr/bin/python2-config' is not executable. > config/Makefile:339: No libdw DWARF unwind found, Please install > elfutils-devel/libdw-dev >= 0.158 and/or set LIBDW_DIR > config/Makefile:481: Missing perl devel files. Disabling perl scripting > support, consider installing perl-ExtUtils-Embed > config/Makefile:512: No python-config tool was found > config/Makefile:512: Python support will not be built > >> [1] https://www.python.org/dev/peps/pep-0394 >> >> Suggested-by: Thomas Ilsche >> Tested-by: Thomas Ilsche >> Signed-off-by: Namhyung Kim >> --- >> tools/perf/config/Makefile | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile >> index e05d8f99424d..60177278a357 100644 >> --- a/tools/perf/config/Makefile >> +++ b/tools/perf/config/Makefile >> @@ -121,8 +121,8 @@ ifdef PARSER_DEBUG >> endif >> >> ifndef NO_LIBPYTHON >> - override PYTHON := \ >> -$(call get-executable-or-default,PYTHON,python) >> + PYTHON2 := $(if $(call get-executable,python2),python2,python) >> + override PYTHON := $(call get-executable-or-default,PYTHON,$(PYTHON2)) >>override PYTHON_CONFIG := \ >> $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config) >> >> -- >> 2.0.0 smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH v2 2/2] perf tools: Default to python version 2
Zitat von Arnaldo Carvalho de Melo a...@kernel.org: Em Tue, Jul 29, 2014 at 03:57:20PM +0900, Namhyung Kim escreveu: According to PEP 394 recommendation [1], it's more portable to use python2 rather than plain python to refer python binary version 2. Since there're distros using python3 by default like Arch, and we don't support python3 (yet), it'd be better using python2 explicitly. But older versions (prior to 2.7) seem not to provide python2 but just python. Given that it's only old version, try python2 first and then fallback to python. It'll ensure that it always points to python 2.x. It should fallback, right? Yes. [acme@fedora14 linux]$ ls -la /usr/bin/python2-config ls: cannot access /usr/bin/python2-config: No such file or directory [acme@fedora14 linux]$ ls -la /usr/bin/python-config lrwxrwxrwx. 1 root root 16 Mar 25 09:43 /usr/bin/python-config - python2.7-config Appearently this fedora package symlinks python - python2 but does not so for python2-config. The Makefile looks for either python2 or python as fallback and then assumes that a respective -config exists. I think this is a sane assumption. I guess if this needs to be supported, there needs to be another fallback if $(PYTHON)-config is not found to try python-config again. However this may behave weirdly if someone has python2, no python2-devel but python3-devel installed, sets PYTHON=python2 and then still gets a version error. Or we somehow only fallback if no PYTHON is specified (not sure how to implemented that elegantly). Best, Thomas [acme@fedora14 linux]$ rpm -qf /usr/bin/python-config python-devel-2.7-8.fc14.1.x86_64 [acme@fedora14 linux]$ cat /etc/fedora-release Fedora release 14 (Laughlin) [acme@fedora14 linux]$ [acme@fedora14 linux]$ time make O=/tmp/build/perf -C tools/perf install make: Entering directory `/home/acme/git/linux/tools/perf' BUILD: Doing 'make -j4' parallel build config/Makefile:126: The path '/usr/bin/python2-config' is not executable. config/Makefile:339: No libdw DWARF unwind found, Please install elfutils-devel/libdw-dev = 0.158 and/or set LIBDW_DIR config/Makefile:481: Missing perl devel files. Disabling perl scripting support, consider installing perl-ExtUtils-Embed config/Makefile:512: No python-config tool was found config/Makefile:512: Python support will not be built [1] https://www.python.org/dev/peps/pep-0394 Suggested-by: Thomas Ilsche thomas.ils...@tu-dresden.de Tested-by: Thomas Ilsche thomas.ils...@tu-dresden.de Signed-off-by: Namhyung Kim namhy...@kernel.org --- tools/perf/config/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile index e05d8f99424d..60177278a357 100644 --- a/tools/perf/config/Makefile +++ b/tools/perf/config/Makefile @@ -121,8 +121,8 @@ ifdef PARSER_DEBUG endif ifndef NO_LIBPYTHON - override PYTHON := \ -$(call get-executable-or-default,PYTHON,python) + PYTHON2 := $(if $(call get-executable,python2),python2,python) + override PYTHON := $(call get-executable-or-default,PYTHON,$(PYTHON2)) override PYTHON_CONFIG := \ $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config) -- 2.0.0 smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH 2/2] perf tools: Default to python version 2
Hi Namhyung, >>> I'm thinking about something like below.. but sadly it doesn't work for >>> me.. hmm. >> >> Actually this appears to work for me (with 2.6.9 & 2.7.6) and I >> find this solution more >> elegant. > > Thanks for testing. It was my fault on setting symlink to a > non-existing file for testing. It now works well for me too. > > Can I add your Tested-by then? Yes Tested-by: Thomas Ilsche Plase note that your patch requires the previous patch that just set the default to python2. I tested in an Arch Linux VM with: A) no system install of python, source installation 2.6.9 in PATH B) no system install of python, source installation 2.7.6 in PATH C) python3 (as python) and python2 system installation In all cases it reports to build with libpython and perf script with a dummy Python script works. If no python whatsoever is in PATH it reports: config/Makefile:121: The path 'python' is not executable. sh: line 0: command: -c: invalid option command: usage: command [-pVv] command [arg ...] config/Makefile:122: The path '-config' is not executable. And the created perf executable does not work with the dummy script. If only pthon3 is in PATH I get the expected version error message (build stops.) Note that for the manual old installations of python I had an issue with linking order and a -Werror issue that I resolved manually. But this is fixed with the latest system install of python. Best, Thomas > > Thanks, > Namhyung smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH 2/2] perf tools: Default to python version 2
Hi Namhyung, I'm thinking about something like below.. but sadly it doesn't work for me.. hmm. Actually this appears to work for me (with 2.6.9 2.7.6) and I find this solution more elegant. Thanks for testing. It was my fault on setting symlink to a non-existing file for testing. It now works well for me too. Can I add your Tested-by then? Yes Tested-by: Thomas Ilsche thomas.ils...@tu-dresden.de Plase note that your patch requires the previous patch that just set the default to python2. I tested in an Arch Linux VM with: A) no system install of python, source installation 2.6.9 in PATH B) no system install of python, source installation 2.7.6 in PATH C) python3 (as python) and python2 system installation In all cases it reports to build with libpython and perf script with a dummy Python script works. If no python whatsoever is in PATH it reports: config/Makefile:121: The path 'python' is not executable. sh: line 0: command: -c: invalid option command: usage: command [-pVv] command [arg ...] config/Makefile:122: The path '-config' is not executable. And the created perf executable does not work with the dummy script. If only pthon3 is in PATH I get the expected version error message (build stops.) Note that for the manual old installations of python I had an issue with linking order and a -Werror issue that I resolved manually. But this is fixed with the latest system install of python. Best, Thomas Thanks, Namhyung smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH 2/2] perf tools: Default to python version 2
Hi Namhyung, apologies for mixing up your first name earlier. On 2014-07-25 12:24, Namhyung Kim wrote: > Hi Thomas, > > On Fri, 25 Jul 2014 11:28:55 +0200, Thomas Ilsche wrote: >> Hi Kim, >> >> On 2014-07-25 05:14, Namhyung Kim wrote: >>> According to PEP 394 recommendation [1], it's more portable to use >>> python2 rather than plain python to refer python binary version 2. >> >> I tried to find out how backwards-compatible this is. python2(-config) >> was first available in Python 2.7.3 (April 2012), but it is still not >> available in 2.6.9 (Oct. 2013). So it might be better to use python2 >> as default but fall back to python if python2 is not available. >> >> Best, >> Thomas >> >> Signed-off-by: Thomas Ilsche >> --- >> diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile >> index f4f7f58..1b1bc9c 100644 >> --- a/tools/perf/config/Makefile >> +++ b/tools/perf/config/Makefile >> @@ -117,8 +117,8 @@ ifdef PARSER_DEBUG >> endif >> >> ifndef NO_LIBPYTHON >> - override PYTHON := \ >> -$(call get-executable-or-default,PYTHON,python) >> + PYTHON2 := $(call get-executable-or-default,PYTHON,python2) > > But wouldn't it set PYTHON2 to python2 even if the system doesn't have > python2 actually? > > >> + override PYTHON := $(if $(PYTHON2),$(PYTHON2),$(call >> get-executable,python)) > > And then it'll set PYTHON to python2, no? > No, get-executable-or-default only returns anything if it is an actual executable (asserted by a 'command -v' and 'test -f $ -a -x $'). At least this is how I understand the Makefile code. I also tested it with a Python 2.6.9 (no python2 in path) and it worked (after fixing some totally unrelated issues in python headers). It will however complain that "The path 'python2' is not executable." before using python. > >>override PYTHON_CONFIG := \ >> $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config) > > > I'm thinking about something like below.. but sadly it doesn't work for > me.. hmm. Actually this appears to work for me (with 2.6.9 & 2.7.6) and I find this solution more elegant. Best, Thomas > > Thanks, > Namhyung > > > > diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile > index 0f4b7fbc4852..60177278a357 100644 > --- a/tools/perf/config/Makefile > +++ b/tools/perf/config/Makefile > @@ -121,8 +121,8 @@ ifdef PARSER_DEBUG > endif > > ifndef NO_LIBPYTHON > - override PYTHON := \ > -$(call get-executable-or-default,PYTHON,python2) > + PYTHON2 := $(if $(call get-executable,python2),python2,python) > + override PYTHON := $(call get-executable-or-default,PYTHON,$(PYTHON2)) >override PYTHON_CONFIG := \ > $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config) > > -- Dipl. Inf. Thomas Ilsche Computer Scientist Highly Adaptive Energy-Efficient Computing CRC 912 HAEC: http://tu-dresden.de/sfb912 Technische Universität Dresden Center for Information Services and High Performance Computing (ZIH) 01062 Dresden, Germany Phone: +49 351 463-42168 Fax: +49 351 463-3773 E-Mail: thomas.ils...@tu-dresden.de smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH 2/2] perf tools: Default to python version 2
Hi Kim, On 2014-07-25 05:14, Namhyung Kim wrote: > According to PEP 394 recommendation [1], it's more portable to use > python2 rather than plain python to refer python binary version 2. I tried to find out how backwards-compatible this is. python2(-config) was first available in Python 2.7.3 (April 2012), but it is still not available in 2.6.9 (Oct. 2013). So it might be better to use python2 as default but fall back to python if python2 is not available. Best, Thomas Signed-off-by: Thomas Ilsche --- diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile index f4f7f58..1b1bc9c 100644 --- a/tools/perf/config/Makefile +++ b/tools/perf/config/Makefile @@ -117,8 +117,8 @@ ifdef PARSER_DEBUG endif ifndef NO_LIBPYTHON - override PYTHON := \ -$(call get-executable-or-default,PYTHON,python) + PYTHON2 := $(call get-executable-or-default,PYTHON,python2) + override PYTHON := $(if $(PYTHON2),$(PYTHON2),$(call get-executable,python)) override PYTHON_CONFIG := \ $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] perf tools: Fix make PYTHON override
Hi Namhyung, On 2014-07-25 05:14, Namhyung Kim wrote: > Thomas reported that make PYTHON=python2 is not work on some systems. > I can reproduce it on my ArchLinux box too. > > This is because it's overridden by config/feature-checks/Makefile > regardless of PYTHON setting. I guess it's a bug slipped into during > the feature checking change. > > Actually, we don't need to check python-config in the feature-checks. > We can just pass appropriate FEATURE_CHECK_*FLAGS. The patch works fine for me. Thanks, Thomas > > Reported-by: Thomas Ilsche > Signed-off-by: Namhyung Kim > --- > tools/perf/config/Makefile| 34 > --- > tools/perf/config/feature-checks/Makefile | 18 ++-- > 2 files changed, 24 insertions(+), 28 deletions(-) > > diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile > index 1f67aa02d240..e05d8f99424d 100644 > --- a/tools/perf/config/Makefile > +++ b/tools/perf/config/Makefile > @@ -120,6 +120,23 @@ ifdef PARSER_DEBUG >CFLAGS += -DPARSER_DEBUG > endif > > +ifndef NO_LIBPYTHON > + override PYTHON := \ > +$(call get-executable-or-default,PYTHON,python) > + override PYTHON_CONFIG := \ > +$(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config) > + > + PYTHON_CONFIG_SQ := $(call shell-sq,$(PYTHON_CONFIG)) > + > + PYTHON_EMBED_LDOPTS := $(shell $(PYTHON_CONFIG_SQ) --ldflags 2>/dev/null) > + PYTHON_EMBED_CCOPTS := $(shell $(PYTHON_CONFIG_SQ) --cflags 2>/dev/null) > + > + FEATURE_CHECK_CFLAGS-libpython := $(PYTHON_EMBED_CCOPTS) > + FEATURE_CHECK_LDFLAGS-libpython := $(PYTHON_EMBED_LDOPTS) > + FEATURE_CHECK_CFLAGS-libpython-version := $(PYTHON_EMBED_CCOPTS) > + FEATURE_CHECK_LDFLAGS-libpython-version := $(PYTHON_EMBED_LDOPTS) > +endif > + > CFLAGS += -fno-omit-frame-pointer > CFLAGS += -ggdb3 > CFLAGS += -funwind-tables > @@ -482,21 +499,14 @@ define disable-python_code >NO_LIBPYTHON := 1 > endef > > -override PYTHON := \ > - $(call get-executable-or-default,PYTHON,python) > - > -ifndef PYTHON > - $(call disable-python,python interpreter) > +ifdef NO_LIBPYTHON > + $(call disable-python) > else > > - PYTHON_WORD := $(call shell-wordify,$(PYTHON)) > - > - ifdef NO_LIBPYTHON > -$(call disable-python) > + ifndef PYTHON > +$(call disable-python,python interpreter) >else > - > -override PYTHON_CONFIG := \ > - $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config) > +PYTHON_WORD := $(call shell-wordify,$(PYTHON)) > > ifndef PYTHON_CONFIG >$(call disable-python,python-config tool) > diff --git a/tools/perf/config/feature-checks/Makefile > b/tools/perf/config/feature-checks/Makefile > index 6088f8d8a434..72ab2984718e 100644 > --- a/tools/perf/config/feature-checks/Makefile > +++ b/tools/perf/config/feature-checks/Makefile > @@ -101,25 +101,11 @@ FLAGS_PERL_EMBED=$(PERL_EMBED_CCOPTS) > $(PERL_EMBED_LDOPTS) > test-libperl.bin: > $(BUILD) $(FLAGS_PERL_EMBED) > > -override PYTHON := python > -override PYTHON_CONFIG := python-config > - > -escape-for-shell-sq = $(subst ','\'',$(1)) > -shell-sq = '$(escape-for-shell-sq)' > - > -PYTHON_CONFIG_SQ = $(call shell-sq,$(PYTHON_CONFIG)) > - > -PYTHON_EMBED_LDOPTS = $(shell $(PYTHON_CONFIG_SQ) --ldflags 2>/dev/null) > -PYTHON_EMBED_LDFLAGS = $(call strip-libs,$(PYTHON_EMBED_LDOPTS)) > -PYTHON_EMBED_LIBADD = $(call grep-libs,$(PYTHON_EMBED_LDOPTS)) > -PYTHON_EMBED_CCOPTS = $(shell $(PYTHON_CONFIG_SQ) --cflags 2>/dev/null) > -FLAGS_PYTHON_EMBED = $(PYTHON_EMBED_CCOPTS) $(PYTHON_EMBED_LDOPTS) > - > test-libpython.bin: > - $(BUILD) $(FLAGS_PYTHON_EMBED) > + $(BUILD) > > test-libpython-version.bin: > - $(BUILD) $(FLAGS_PYTHON_EMBED) > + $(BUILD) > > test-libbfd.bin: > $(BUILD) -DPACKAGE='"perf"' -lbfd -lz -liberty -ldl > -- Dipl. Inf. Thomas Ilsche Computer Scientist Highly Adaptive Energy-Efficient Computing CRC 912 HAEC: http://tu-dresden.de/sfb912 Technische Universität Dresden Center for Information Services and High Performance Computing (ZIH) 01062 Dresden, Germany Phone: +49 351 463-42168 Fax: +49 351 463-3773 E-Mail: thomas.ils...@tu-dresden.de smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH 1/2] perf tools: Fix make PYTHON override
Hi Namhyung, On 2014-07-25 05:14, Namhyung Kim wrote: Thomas reported that make PYTHON=python2 is not work on some systems. I can reproduce it on my ArchLinux box too. This is because it's overridden by config/feature-checks/Makefile regardless of PYTHON setting. I guess it's a bug slipped into during the feature checking change. Actually, we don't need to check python-config in the feature-checks. We can just pass appropriate FEATURE_CHECK_*FLAGS. The patch works fine for me. Thanks, Thomas Reported-by: Thomas Ilsche thomas.ils...@tu-dresden.de Signed-off-by: Namhyung Kim namhy...@kernel.org --- tools/perf/config/Makefile| 34 --- tools/perf/config/feature-checks/Makefile | 18 ++-- 2 files changed, 24 insertions(+), 28 deletions(-) diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile index 1f67aa02d240..e05d8f99424d 100644 --- a/tools/perf/config/Makefile +++ b/tools/perf/config/Makefile @@ -120,6 +120,23 @@ ifdef PARSER_DEBUG CFLAGS += -DPARSER_DEBUG endif +ifndef NO_LIBPYTHON + override PYTHON := \ +$(call get-executable-or-default,PYTHON,python) + override PYTHON_CONFIG := \ +$(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config) + + PYTHON_CONFIG_SQ := $(call shell-sq,$(PYTHON_CONFIG)) + + PYTHON_EMBED_LDOPTS := $(shell $(PYTHON_CONFIG_SQ) --ldflags 2/dev/null) + PYTHON_EMBED_CCOPTS := $(shell $(PYTHON_CONFIG_SQ) --cflags 2/dev/null) + + FEATURE_CHECK_CFLAGS-libpython := $(PYTHON_EMBED_CCOPTS) + FEATURE_CHECK_LDFLAGS-libpython := $(PYTHON_EMBED_LDOPTS) + FEATURE_CHECK_CFLAGS-libpython-version := $(PYTHON_EMBED_CCOPTS) + FEATURE_CHECK_LDFLAGS-libpython-version := $(PYTHON_EMBED_LDOPTS) +endif + CFLAGS += -fno-omit-frame-pointer CFLAGS += -ggdb3 CFLAGS += -funwind-tables @@ -482,21 +499,14 @@ define disable-python_code NO_LIBPYTHON := 1 endef -override PYTHON := \ - $(call get-executable-or-default,PYTHON,python) - -ifndef PYTHON - $(call disable-python,python interpreter) +ifdef NO_LIBPYTHON + $(call disable-python) else - PYTHON_WORD := $(call shell-wordify,$(PYTHON)) - - ifdef NO_LIBPYTHON -$(call disable-python) + ifndef PYTHON +$(call disable-python,python interpreter) else - -override PYTHON_CONFIG := \ - $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config) +PYTHON_WORD := $(call shell-wordify,$(PYTHON)) ifndef PYTHON_CONFIG $(call disable-python,python-config tool) diff --git a/tools/perf/config/feature-checks/Makefile b/tools/perf/config/feature-checks/Makefile index 6088f8d8a434..72ab2984718e 100644 --- a/tools/perf/config/feature-checks/Makefile +++ b/tools/perf/config/feature-checks/Makefile @@ -101,25 +101,11 @@ FLAGS_PERL_EMBED=$(PERL_EMBED_CCOPTS) $(PERL_EMBED_LDOPTS) test-libperl.bin: $(BUILD) $(FLAGS_PERL_EMBED) -override PYTHON := python -override PYTHON_CONFIG := python-config - -escape-for-shell-sq = $(subst ','\'',$(1)) -shell-sq = '$(escape-for-shell-sq)' - -PYTHON_CONFIG_SQ = $(call shell-sq,$(PYTHON_CONFIG)) - -PYTHON_EMBED_LDOPTS = $(shell $(PYTHON_CONFIG_SQ) --ldflags 2/dev/null) -PYTHON_EMBED_LDFLAGS = $(call strip-libs,$(PYTHON_EMBED_LDOPTS)) -PYTHON_EMBED_LIBADD = $(call grep-libs,$(PYTHON_EMBED_LDOPTS)) -PYTHON_EMBED_CCOPTS = $(shell $(PYTHON_CONFIG_SQ) --cflags 2/dev/null) -FLAGS_PYTHON_EMBED = $(PYTHON_EMBED_CCOPTS) $(PYTHON_EMBED_LDOPTS) - test-libpython.bin: - $(BUILD) $(FLAGS_PYTHON_EMBED) + $(BUILD) test-libpython-version.bin: - $(BUILD) $(FLAGS_PYTHON_EMBED) + $(BUILD) test-libbfd.bin: $(BUILD) -DPACKAGE='perf' -lbfd -lz -liberty -ldl -- Dipl. Inf. Thomas Ilsche Computer Scientist Highly Adaptive Energy-Efficient Computing CRC 912 HAEC: http://tu-dresden.de/sfb912 Technische Universität Dresden Center for Information Services and High Performance Computing (ZIH) 01062 Dresden, Germany Phone: +49 351 463-42168 Fax: +49 351 463-3773 E-Mail: thomas.ils...@tu-dresden.de smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH 2/2] perf tools: Default to python version 2
Hi Kim, On 2014-07-25 05:14, Namhyung Kim wrote: According to PEP 394 recommendation [1], it's more portable to use python2 rather than plain python to refer python binary version 2. I tried to find out how backwards-compatible this is. python2(-config) was first available in Python 2.7.3 (April 2012), but it is still not available in 2.6.9 (Oct. 2013). So it might be better to use python2 as default but fall back to python if python2 is not available. Best, Thomas Signed-off-by: Thomas Ilsche thomas.ils...@tu-dresden.de --- diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile index f4f7f58..1b1bc9c 100644 --- a/tools/perf/config/Makefile +++ b/tools/perf/config/Makefile @@ -117,8 +117,8 @@ ifdef PARSER_DEBUG endif ifndef NO_LIBPYTHON - override PYTHON := \ -$(call get-executable-or-default,PYTHON,python) + PYTHON2 := $(call get-executable-or-default,PYTHON,python2) + override PYTHON := $(if $(PYTHON2),$(PYTHON2),$(call get-executable,python)) override PYTHON_CONFIG := \ $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] perf tools: Default to python version 2
Hi Namhyung, apologies for mixing up your first name earlier. On 2014-07-25 12:24, Namhyung Kim wrote: Hi Thomas, On Fri, 25 Jul 2014 11:28:55 +0200, Thomas Ilsche wrote: Hi Kim, On 2014-07-25 05:14, Namhyung Kim wrote: According to PEP 394 recommendation [1], it's more portable to use python2 rather than plain python to refer python binary version 2. I tried to find out how backwards-compatible this is. python2(-config) was first available in Python 2.7.3 (April 2012), but it is still not available in 2.6.9 (Oct. 2013). So it might be better to use python2 as default but fall back to python if python2 is not available. Best, Thomas Signed-off-by: Thomas Ilsche thomas.ils...@tu-dresden.de --- diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile index f4f7f58..1b1bc9c 100644 --- a/tools/perf/config/Makefile +++ b/tools/perf/config/Makefile @@ -117,8 +117,8 @@ ifdef PARSER_DEBUG endif ifndef NO_LIBPYTHON - override PYTHON := \ -$(call get-executable-or-default,PYTHON,python) + PYTHON2 := $(call get-executable-or-default,PYTHON,python2) But wouldn't it set PYTHON2 to python2 even if the system doesn't have python2 actually? + override PYTHON := $(if $(PYTHON2),$(PYTHON2),$(call get-executable,python)) And then it'll set PYTHON to python2, no? No, get-executable-or-default only returns anything if it is an actual executable (asserted by a 'command -v' and 'test -f $ -a -x $'). At least this is how I understand the Makefile code. I also tested it with a Python 2.6.9 (no python2 in path) and it worked (after fixing some totally unrelated issues in python headers). It will however complain that The path 'python2' is not executable. before using python. override PYTHON_CONFIG := \ $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config) I'm thinking about something like below.. but sadly it doesn't work for me.. hmm. Actually this appears to work for me (with 2.6.9 2.7.6) and I find this solution more elegant. Best, Thomas Thanks, Namhyung diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile index 0f4b7fbc4852..60177278a357 100644 --- a/tools/perf/config/Makefile +++ b/tools/perf/config/Makefile @@ -121,8 +121,8 @@ ifdef PARSER_DEBUG endif ifndef NO_LIBPYTHON - override PYTHON := \ -$(call get-executable-or-default,PYTHON,python2) + PYTHON2 := $(if $(call get-executable,python2),python2,python) + override PYTHON := $(call get-executable-or-default,PYTHON,$(PYTHON2)) override PYTHON_CONFIG := \ $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config) -- Dipl. Inf. Thomas Ilsche Computer Scientist Highly Adaptive Energy-Efficient Computing CRC 912 HAEC: http://tu-dresden.de/sfb912 Technische Universität Dresden Center for Information Services and High Performance Computing (ZIH) 01062 Dresden, Germany Phone: +49 351 463-42168 Fax: +49 351 463-3773 E-Mail: thomas.ils...@tu-dresden.de smime.p7s Description: S/MIME Cryptographic Signature
Perf: cannot overide PYTHON(_CONFIG)
Dear perf maintainers, It is documented in Makefile.perf and the Python 3 error message to use make PYTHON=python2. However this does not work anymore since 9734163b6ee1425c6fa4b65d7e6ce34c9079420d moved the libpython feature check. Further it would be more portable anyway to use "python2" as default instead of "python" as long as python3 is not supported. This follows the PEP 394 recommendation: https://www.python.org/dev/peps/pep-0394 I am not sure how to correctly fix the failure to respect PYTHON(_CONFIG). I would assume that early in perf/config/Makefile FEATURE_CHECK_[CFLAGS|LDFLAGS]-libpython(-version) needs to be set (before the feature checks are executed) and that the reconstruction of FLAGS_PYTHON_EMBED based on hardcoded python-config needs to be removed. Best Regards, Thomas -- Dipl. Inf. Thomas Ilsche Computer Scientist Highly Adaptive Energy-Efficient Computing CRC 912 HAEC: http://tu-dresden.de/sfb912 Technische Universität Dresden Center for Information Services and High Performance Computing (ZIH) 01062 Dresden, Germany Phone: +49 351 463-42168 Fax: +49 351 463-3773 E-Mail: thomas.ils...@tu-dresden.de -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Perf: cannot overide PYTHON(_CONFIG)
Dear perf maintainers, It is documented in Makefile.perf and the Python 3 error message to use make PYTHON=python2. However this does not work anymore since 9734163b6ee1425c6fa4b65d7e6ce34c9079420d moved the libpython feature check. Further it would be more portable anyway to use python2 as default instead of python as long as python3 is not supported. This follows the PEP 394 recommendation: https://www.python.org/dev/peps/pep-0394 I am not sure how to correctly fix the failure to respect PYTHON(_CONFIG). I would assume that early in perf/config/Makefile FEATURE_CHECK_[CFLAGS|LDFLAGS]-libpython(-version) needs to be set (before the feature checks are executed) and that the reconstruction of FLAGS_PYTHON_EMBED based on hardcoded python-config needs to be removed. Best Regards, Thomas -- Dipl. Inf. Thomas Ilsche Computer Scientist Highly Adaptive Energy-Efficient Computing CRC 912 HAEC: http://tu-dresden.de/sfb912 Technische Universität Dresden Center for Information Services and High Performance Computing (ZIH) 01062 Dresden, Germany Phone: +49 351 463-42168 Fax: +49 351 463-3773 E-Mail: thomas.ils...@tu-dresden.de -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Provide additional sample information to Python scripts
On 2014-03-07 15:18, Arnaldo Carvalho de Melo wrote: > Em Tue, Feb 18, 2014 at 09:43:24AM +0100, Joseph Schuchart escreveu: >> Good morning, >> >> We have developed a patch for the perf Python scripting interface to >> provide additional information about the pid, tid, and cpu of generic >> events as well as information about the call-stack and resolved symbol >> names. This provides scripts with a greater level of detail. The >> mentioned information is already available to the scripting engine and >> just has to be handed down. This is done by the attached patch. The >> patch is based on Linux-3.13.3. >> >> Please let me know if you have any questions on this. > > Can you please resend, against the perf/core branch in > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git, and as an > attachement or making sure that the patch is not mangled? Joseph and I are working on this but due to vacation we expect to send you the patch with some improvements in the beginning of April. Please let us know in case there is some deadline. Thanks, Thomas > > - Arnaldo > >> Thanks >> Joseph >> -- >> Dipl. Inf. Joseph Schuchart >> Computer Scientist >> >> Technische Universität Dresden >> Center for Information Services and High Performance Computing (ZIH) >> 01062 Dresden, Germany >> >> Phone: +49 351 463-36494 >> Fax: +49 351 463-3773 >> E-Mail: joseph.schuch...@tu-dresden.de > >> Perf: Provide sample information and call-chain to Python script >> >> Provide additional sample information on generic events to Python >> scripts, including pid, tid, and cpu for which the event was recorded. >> Additionally, provide the call-stack recorded at each event with >> resolved symbols. At the moment, the pointer to the sample struct >> is passed to scripts, which seems to be of little use. The patch >> puts this information in dictionaries for easy access by Python >> scripts. >> >> Signed-off-by: Joseph Schuchart >> Acked-by: Thomas Ilsche >> >> @@ -359,7 +359,7 @@ static void python_process_general_event >> struct thread *thread, >> struct addr_location *al) >> { >> -PyObject *handler, *retval, *t, *dict; >> +PyObject *handler, *retval, *t, *dict, *dict_sample; >> static char handler_name[64]; >> unsigned n = 0; >> >> @@ -375,6 +375,10 @@ static void python_process_general_event >> if (!dict) >> Py_FatalError("couldn't create Python dictionary"); >> >> +dict_sample = PyDict_New(); >> +if (!dict_sample) >> +Py_FatalError("couldn't create Python dictionary"); >> + >> snprintf(handler_name, sizeof(handler_name), "%s", "process_event"); >> >> handler = PyDict_GetItemString(main_dict, handler_name); >> @@ -384,8 +388,76 @@ static void python_process_general_event >> pydict_set_item_string_decref(dict, "ev_name", >> PyString_FromString(perf_evsel__name(evsel))); >> pydict_set_item_string_decref(dict, "attr", PyString_FromStringAndSize( >> (const char *)>attr, sizeof(evsel->attr))); >> -pydict_set_item_string_decref(dict, "sample", >> PyString_FromStringAndSize( >> -(const char *)sample, sizeof(*sample))); >> + >> +/* PID/TIDs are limited to 2^29, so we can safely use PyInt */ >> +pydict_set_item_string_decref(dict_sample, "pid", >> PyInt_FromLong(sample->pid)); >> +pydict_set_item_string_decref(dict_sample, "tid", >> PyInt_FromLong(sample->tid)); >> +pydict_set_item_string_decref(dict_sample, "cpu", >> PyInt_FromLong(sample->cpu)); >> +pydict_set_item_string_decref(dict_sample, "time", >> PyLong_FromUnsignedLongLong(sample->time)); >> +pydict_set_item_string_decref(dict, "sample", dict_sample); >> + >> +/* ip unwinding */ >> + >> +if (symbol_conf.use_callchain && sample->callchain) { >> +PyObject *pylist; >> + >> +if (machine__resolve_callchain(machine, evsel, al->thread, >> + sample, NULL, NULL, >> PERF_MAX_STACK_DEPTH) != 0) { >> +pr_err("Failed to resolve callchain. Skipping\n"); >> +goto exit; >> +} >> +callchain_cu
Re: [PATCH] Provide additional sample information to Python scripts
On 2014-03-07 15:18, Arnaldo Carvalho de Melo wrote: Em Tue, Feb 18, 2014 at 09:43:24AM +0100, Joseph Schuchart escreveu: Good morning, We have developed a patch for the perf Python scripting interface to provide additional information about the pid, tid, and cpu of generic events as well as information about the call-stack and resolved symbol names. This provides scripts with a greater level of detail. The mentioned information is already available to the scripting engine and just has to be handed down. This is done by the attached patch. The patch is based on Linux-3.13.3. Please let me know if you have any questions on this. Can you please resend, against the perf/core branch in git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git, and as an attachement or making sure that the patch is not mangled? Joseph and I are working on this but due to vacation we expect to send you the patch with some improvements in the beginning of April. Please let us know in case there is some deadline. Thanks, Thomas - Arnaldo Thanks Joseph -- Dipl. Inf. Joseph Schuchart Computer Scientist Technische Universität Dresden Center for Information Services and High Performance Computing (ZIH) 01062 Dresden, Germany Phone: +49 351 463-36494 Fax: +49 351 463-3773 E-Mail: joseph.schuch...@tu-dresden.de Perf: Provide sample information and call-chain to Python script Provide additional sample information on generic events to Python scripts, including pid, tid, and cpu for which the event was recorded. Additionally, provide the call-stack recorded at each event with resolved symbols. At the moment, the pointer to the sample struct is passed to scripts, which seems to be of little use. The patch puts this information in dictionaries for easy access by Python scripts. Signed-off-by: Joseph Schuchart joseph.schuch...@tu-dresden.de Acked-by: Thomas Ilsche thomas.ils...@tu-dresden.de @@ -359,7 +359,7 @@ static void python_process_general_event struct thread *thread, struct addr_location *al) { -PyObject *handler, *retval, *t, *dict; +PyObject *handler, *retval, *t, *dict, *dict_sample; static char handler_name[64]; unsigned n = 0; @@ -375,6 +375,10 @@ static void python_process_general_event if (!dict) Py_FatalError(couldn't create Python dictionary); +dict_sample = PyDict_New(); +if (!dict_sample) +Py_FatalError(couldn't create Python dictionary); + snprintf(handler_name, sizeof(handler_name), %s, process_event); handler = PyDict_GetItemString(main_dict, handler_name); @@ -384,8 +388,76 @@ static void python_process_general_event pydict_set_item_string_decref(dict, ev_name, PyString_FromString(perf_evsel__name(evsel))); pydict_set_item_string_decref(dict, attr, PyString_FromStringAndSize( (const char *)evsel-attr, sizeof(evsel-attr))); -pydict_set_item_string_decref(dict, sample, PyString_FromStringAndSize( -(const char *)sample, sizeof(*sample))); + +/* PID/TIDs are limited to 2^29, so we can safely use PyInt */ +pydict_set_item_string_decref(dict_sample, pid, PyInt_FromLong(sample-pid)); +pydict_set_item_string_decref(dict_sample, tid, PyInt_FromLong(sample-tid)); +pydict_set_item_string_decref(dict_sample, cpu, PyInt_FromLong(sample-cpu)); +pydict_set_item_string_decref(dict_sample, time, PyLong_FromUnsignedLongLong(sample-time)); +pydict_set_item_string_decref(dict, sample, dict_sample); + +/* ip unwinding */ + +if (symbol_conf.use_callchain sample-callchain) { +PyObject *pylist; + +if (machine__resolve_callchain(machine, evsel, al-thread, + sample, NULL, NULL, PERF_MAX_STACK_DEPTH) != 0) { +pr_err(Failed to resolve callchain. Skipping\n); +goto exit; +} +callchain_cursor_commit(callchain_cursor); + +pylist = PyList_New(0); +if (!pylist) +Py_FatalError(couldn't create Python list); + +while (1) { +PyObject *pyelem; +struct callchain_cursor_node *node; +node = callchain_cursor_current(callchain_cursor); +if (!node) +break; + +pyelem = PyDict_New(); +if (!pyelem) +Py_FatalError(couldn't create Python dictionary); + + +pydict_set_item_string_decref(pyelem, ip, PyInt_FromLong(node-ip)); + +if (node-sym) { +PyObject *pysym = PyDict_New(); +if (!pysym) +Py_FatalError