Re: [RFC PATCH 0/3] CFS idle injection
On 11/10/15, Peter Zijlstra wrote: > On Tue, Nov 10, 2015 at 10:07:35AM +, Juri Lelli wrote: > > Do you think that using SCHED_DEADLINE here would be completely > > foolish? I mean, we would have the duty_cycle/period thing for free, it > > would be know to the scheduler (as to maybe address Thomas' concerns) > > and we could think to make idle injection part of system analysis (for > > the soft-RT use cases). > > DEADLINE would be awesome, but I think we need work on two fronts before > we can really sell it as the awesome that it is ;-) > > - greedy and or statistical bounds > - !priv > > The first is such that we can better deal with the erratic nature of > media decode without going full worst case on it, and the second just > makes it so much more accessible. > Right. For the first point I think we just need to make our off-line calculation right, not that we need to modify implementation. On the second point we need more work yes. Also, another thing that is missing is frequency (uarch) scaling for reservations parameters, something alike what we are doing for CFS; this last point might be solved sooner :-). Thanks, - Juri -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On Tue, Nov 10, 2015 at 10:07:35AM +, Juri Lelli wrote: > Do you think that using SCHED_DEADLINE here would be completely > foolish? I mean, we would have the duty_cycle/period thing for free, it > would be know to the scheduler (as to maybe address Thomas' concerns) > and we could think to make idle injection part of system analysis (for > the soft-RT use cases). DEADLINE would be awesome, but I think we need work on two fronts before we can really sell it as the awesome that it is ;-) - greedy and or statistical bounds - !priv The first is such that we can better deal with the erratic nature of media decode without going full worst case on it, and the second just makes it so much more accessible. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
Hi, On 9 November 2015 at 14:15, Peter Zijlstra wrote: > On Mon, Nov 09, 2015 at 11:56:51AM +, Punit Agrawal wrote: >> Jacob Pan writes: >> > My take is that RT and throttling will never go well together since they >> > are conflicting in principle. >> >> I am not sure I follow. If RT (or other higher priority classes) can't >> be throttled then the CPUs are not able to contribute towards >> constraining power consumption and hence temperature. >> >> This is especially true in certain platforms where tasks belong to the >> RT class to maintain user experience, e.g., audio and video. > > Audio/Video playback generally doesn't take a _lot_ of time these days. > What is important though is _when_ it happens. > > And media playback typically already has a very well defined and stable > cadence (24Hz or whatnot). What you want is for your idle injector to > sync up with that, not disrupt it. > Do you think that using SCHED_DEADLINE here would be completely foolish? I mean, we would have the duty_cycle/period thing for free, it would be know to the scheduler (as to maybe address Thomas' concerns) and we could think to make idle injection part of system analysis (for the soft-RT use cases). Thanks, - Juri > For other workloads, missing a deadline is about as bad as destroying > the chip, complete system shutdown might be safer than getting delayed. > (The very tired scenario of a saw, a laser and your finger; you want to > shut down the entire machine rather than just cut off your finger.) > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
Hi, On 9 November 2015 at 14:15, Peter Zijlstrawrote: > On Mon, Nov 09, 2015 at 11:56:51AM +, Punit Agrawal wrote: >> Jacob Pan writes: >> > My take is that RT and throttling will never go well together since they >> > are conflicting in principle. >> >> I am not sure I follow. If RT (or other higher priority classes) can't >> be throttled then the CPUs are not able to contribute towards >> constraining power consumption and hence temperature. >> >> This is especially true in certain platforms where tasks belong to the >> RT class to maintain user experience, e.g., audio and video. > > Audio/Video playback generally doesn't take a _lot_ of time these days. > What is important though is _when_ it happens. > > And media playback typically already has a very well defined and stable > cadence (24Hz or whatnot). What you want is for your idle injector to > sync up with that, not disrupt it. > Do you think that using SCHED_DEADLINE here would be completely foolish? I mean, we would have the duty_cycle/period thing for free, it would be know to the scheduler (as to maybe address Thomas' concerns) and we could think to make idle injection part of system analysis (for the soft-RT use cases). Thanks, - Juri > For other workloads, missing a deadline is about as bad as destroying > the chip, complete system shutdown might be safer than getting delayed. > (The very tired scenario of a saw, a laser and your finger; you want to > shut down the entire machine rather than just cut off your finger.) > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On Tue, Nov 10, 2015 at 10:07:35AM +, Juri Lelli wrote: > Do you think that using SCHED_DEADLINE here would be completely > foolish? I mean, we would have the duty_cycle/period thing for free, it > would be know to the scheduler (as to maybe address Thomas' concerns) > and we could think to make idle injection part of system analysis (for > the soft-RT use cases). DEADLINE would be awesome, but I think we need work on two fronts before we can really sell it as the awesome that it is ;-) - greedy and or statistical bounds - !priv The first is such that we can better deal with the erratic nature of media decode without going full worst case on it, and the second just makes it so much more accessible. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On 11/10/15, Peter Zijlstra wrote: > On Tue, Nov 10, 2015 at 10:07:35AM +, Juri Lelli wrote: > > Do you think that using SCHED_DEADLINE here would be completely > > foolish? I mean, we would have the duty_cycle/period thing for free, it > > would be know to the scheduler (as to maybe address Thomas' concerns) > > and we could think to make idle injection part of system analysis (for > > the soft-RT use cases). > > DEADLINE would be awesome, but I think we need work on two fronts before > we can really sell it as the awesome that it is ;-) > > - greedy and or statistical bounds > - !priv > > The first is such that we can better deal with the erratic nature of > media decode without going full worst case on it, and the second just > makes it so much more accessible. > Right. For the first point I think we just need to make our off-line calculation right, not that we need to modify implementation. On the second point we need more work yes. Also, another thing that is missing is frequency (uarch) scaling for reservations parameters, something alike what we are doing for CFS; this last point might be solved sooner :-). Thanks, - Juri -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On Mon, Nov 09, 2015 at 01:23:04PM -0800, Jacob Pan wrote: > what is WFI? Wait For Interrupt; very like the x86 HLT thing. > For Intel, idle states are hints to the HW. The FW decides how far the > idle can go based on many factors, device states included, some are > visible to the OS some are not. We just to help mature such deep idle > conditions. On some ARM you have to manually orchestrate cluster idle, which is clustered idle states in cpuidle. The up-side is that you explicitly know about them, the down side is that its cross CPU bits and a freak show (think doing cross CPU atomics while a CPU isn't in the coherency domain yet). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On Fri, 6 Nov 2015 21:55:49 + Dietmar Eggemann wrote: > > what i am interested is not per cpu idle state but rather at the > > package level or domain. It must be an indication for the > > overlapped idle time. Usually has to come from HW counters. > > I see. We have a similar problem with the Energy Model (EM) on > cluster level (sched domain level DIE). We iterate over the cpus of a > sched group and declare the shallowest cpu idle state as the cluster > idle state to index our EM. On a typical ARM system we have (active, > WFI, cpu-off and cluster-off). But I guess for you the idle state > index is only for core idle states and you can't draw any conclusions > from this for the package idle states. what is WFI? For Intel, idle states are hints to the HW. The FW decides how far the idle can go based on many factors, device states included, some are visible to the OS some are not. We just to help mature such deep idle conditions. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On Mon, 9 Nov 2015 15:15:34 +0100 Peter Zijlstra wrote: > On Mon, Nov 09, 2015 at 11:56:51AM +, Punit Agrawal wrote: > > Jacob Pan writes: > > > My take is that RT and throttling will never go well together > > > since they are conflicting in principle. > > > > I am not sure I follow. If RT (or other higher priority classes) > > can't be throttled then the CPUs are not able to contribute towards > > constraining power consumption and hence temperature. > > > > This is especially true in certain platforms where tasks belong to > > the RT class to maintain user experience, e.g., audio and video. > > Audio/Video playback generally doesn't take a _lot_ of time these > days. What is important though is _when_ it happens. > > And media playback typically already has a very well defined and > stable cadence (24Hz or whatnot). What you want is for your idle > injector to sync up with that, not disrupt it. > Agreed, i have tested idle injection on video playback (mostly one cpu busy, no sync with gpu), it does not do well to improve energy efficiency. With the video playback being offloaded, there is no thermal condition either. So outside the scope of this first patchst trying to solve. The ability to sync with external pattern, could be the next step. kind of like pll in hw :). > For other workloads, missing a deadline is about as bad as destroying > the chip, complete system shutdown might be safer than getting > delayed. (The very tired scenario of a saw, a laser and your finger; > you want to shut down the entire machine rather than just cut off > your finger.) > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On Mon, 09 Nov 2015 11:56:51 + Punit Agrawal wrote: > > actually, I was suggesting to start considering idle injection once > > frequency capped to the energy efficient point, which can be much > > higher than the lowest frequency. The idea being, deep idle power is > > negligible compared to running power which allows near linear > > power-perf scaling for balanced workload. > > Below energy efficient frequency, continuous lowering frequency may > > lose disproportion performance vs. power. i.e. worse than linear. > > > > I agree. I was making that assumption that with the ability to inject > idle states, there wouldn't be a need to expose the inefficient > frequency states. > > Do you still see a reason to do that? yes, but it is up to a governor or management sw to decide when to to pick what mechanism. there may be certain workload scale better with frequency change. e.g. unbalanced workload, we don't want to inject idle to all cpus if just one is busy. but it is also unlikely to run into thermal issue in this case. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On Mon, Nov 09, 2015 at 11:56:51AM +, Punit Agrawal wrote: > Jacob Pan writes: > > My take is that RT and throttling will never go well together since they > > are conflicting in principle. > > I am not sure I follow. If RT (or other higher priority classes) can't > be throttled then the CPUs are not able to contribute towards > constraining power consumption and hence temperature. > > This is especially true in certain platforms where tasks belong to the > RT class to maintain user experience, e.g., audio and video. Audio/Video playback generally doesn't take a _lot_ of time these days. What is important though is _when_ it happens. And media playback typically already has a very well defined and stable cadence (24Hz or whatnot). What you want is for your idle injector to sync up with that, not disrupt it. For other workloads, missing a deadline is about as bad as destroying the chip, complete system shutdown might be safer than getting delayed. (The very tired scenario of a saw, a laser and your finger; you want to shut down the entire machine rather than just cut off your finger.) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
Jacob Pan writes: > On Fri, 06 Nov 2015 16:50:15 + > Punit Agrawal wrote: > >> * idle injection once frequencies have been capped to the lowest >> feasible values (as suggested in the cover letter) >> > actually, I was suggesting to start considering idle injection once > frequency capped to the energy efficient point, which can be much > higher than the lowest frequency. The idea being, deep idle power is > negligible compared to running power which allows near linear > power-perf scaling for balanced workload. > Below energy efficient frequency, continuous lowering frequency may > lose disproportion performance vs. power. i.e. worse than linear. > I agree. I was making that assumption that with the ability to inject idle states, there wouldn't be a need to expose the inefficient frequency states. Do you still see a reason to do that? >> One question about the implementation in these patches - should the >> implementation hook into pick_next_task in core instead of CFS? Higher >> priority tasks might get in the way of idle injection. > My take is that RT and throttling will never go well together since they > are conflicting in principle. I am not sure I follow. If RT (or other higher priority classes) can't be throttled then the CPUs are not able to contribute towards constraining power consumption and hence temperature. This is especially true in certain platforms where tasks belong to the RT class to maintain user experience, e.g., audio and video. > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On Mon, 09 Nov 2015 11:56:51 + Punit Agrawalwrote: > > actually, I was suggesting to start considering idle injection once > > frequency capped to the energy efficient point, which can be much > > higher than the lowest frequency. The idea being, deep idle power is > > negligible compared to running power which allows near linear > > power-perf scaling for balanced workload. > > Below energy efficient frequency, continuous lowering frequency may > > lose disproportion performance vs. power. i.e. worse than linear. > > > > I agree. I was making that assumption that with the ability to inject > idle states, there wouldn't be a need to expose the inefficient > frequency states. > > Do you still see a reason to do that? yes, but it is up to a governor or management sw to decide when to to pick what mechanism. there may be certain workload scale better with frequency change. e.g. unbalanced workload, we don't want to inject idle to all cpus if just one is busy. but it is also unlikely to run into thermal issue in this case. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On Mon, 9 Nov 2015 15:15:34 +0100 Peter Zijlstrawrote: > On Mon, Nov 09, 2015 at 11:56:51AM +, Punit Agrawal wrote: > > Jacob Pan writes: > > > My take is that RT and throttling will never go well together > > > since they are conflicting in principle. > > > > I am not sure I follow. If RT (or other higher priority classes) > > can't be throttled then the CPUs are not able to contribute towards > > constraining power consumption and hence temperature. > > > > This is especially true in certain platforms where tasks belong to > > the RT class to maintain user experience, e.g., audio and video. > > Audio/Video playback generally doesn't take a _lot_ of time these > days. What is important though is _when_ it happens. > > And media playback typically already has a very well defined and > stable cadence (24Hz or whatnot). What you want is for your idle > injector to sync up with that, not disrupt it. > Agreed, i have tested idle injection on video playback (mostly one cpu busy, no sync with gpu), it does not do well to improve energy efficiency. With the video playback being offloaded, there is no thermal condition either. So outside the scope of this first patchst trying to solve. The ability to sync with external pattern, could be the next step. kind of like pll in hw :). > For other workloads, missing a deadline is about as bad as destroying > the chip, complete system shutdown might be safer than getting > delayed. (The very tired scenario of a saw, a laser and your finger; > you want to shut down the entire machine rather than just cut off > your finger.) > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On Mon, Nov 09, 2015 at 11:56:51AM +, Punit Agrawal wrote: > Jacob Panwrites: > > My take is that RT and throttling will never go well together since they > > are conflicting in principle. > > I am not sure I follow. If RT (or other higher priority classes) can't > be throttled then the CPUs are not able to contribute towards > constraining power consumption and hence temperature. > > This is especially true in certain platforms where tasks belong to the > RT class to maintain user experience, e.g., audio and video. Audio/Video playback generally doesn't take a _lot_ of time these days. What is important though is _when_ it happens. And media playback typically already has a very well defined and stable cadence (24Hz or whatnot). What you want is for your idle injector to sync up with that, not disrupt it. For other workloads, missing a deadline is about as bad as destroying the chip, complete system shutdown might be safer than getting delayed. (The very tired scenario of a saw, a laser and your finger; you want to shut down the entire machine rather than just cut off your finger.) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On Mon, Nov 09, 2015 at 01:23:04PM -0800, Jacob Pan wrote: > what is WFI? Wait For Interrupt; very like the x86 HLT thing. > For Intel, idle states are hints to the HW. The FW decides how far the > idle can go based on many factors, device states included, some are > visible to the OS some are not. We just to help mature such deep idle > conditions. On some ARM you have to manually orchestrate cluster idle, which is clustered idle states in cpuidle. The up-side is that you explicitly know about them, the down side is that its cross CPU bits and a freak show (think doing cross CPU atomics while a CPU isn't in the coherency domain yet). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On Fri, 6 Nov 2015 21:55:49 + Dietmar Eggemannwrote: > > what i am interested is not per cpu idle state but rather at the > > package level or domain. It must be an indication for the > > overlapped idle time. Usually has to come from HW counters. > > I see. We have a similar problem with the Energy Model (EM) on > cluster level (sched domain level DIE). We iterate over the cpus of a > sched group and declare the shallowest cpu idle state as the cluster > idle state to index our EM. On a typical ARM system we have (active, > WFI, cpu-off and cluster-off). But I guess for you the idle state > index is only for core idle states and you can't draw any conclusions > from this for the package idle states. what is WFI? For Intel, idle states are hints to the HW. The FW decides how far the idle can go based on many factors, device states included, some are visible to the OS some are not. We just to help mature such deep idle conditions. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
Jacob Panwrites: > On Fri, 06 Nov 2015 16:50:15 + > Punit Agrawal wrote: > >> * idle injection once frequencies have been capped to the lowest >> feasible values (as suggested in the cover letter) >> > actually, I was suggesting to start considering idle injection once > frequency capped to the energy efficient point, which can be much > higher than the lowest frequency. The idea being, deep idle power is > negligible compared to running power which allows near linear > power-perf scaling for balanced workload. > Below energy efficient frequency, continuous lowering frequency may > lose disproportion performance vs. power. i.e. worse than linear. > I agree. I was making that assumption that with the ability to inject idle states, there wouldn't be a need to expose the inefficient frequency states. Do you still see a reason to do that? >> One question about the implementation in these patches - should the >> implementation hook into pick_next_task in core instead of CFS? Higher >> priority tasks might get in the way of idle injection. > My take is that RT and throttling will never go well together since they > are conflicting in principle. I am not sure I follow. If RT (or other higher priority classes) can't be throttled then the CPUs are not able to contribute towards constraining power consumption and hence temperature. This is especially true in certain platforms where tasks belong to the RT class to maintain user experience, e.g., audio and video. > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On 11/06/2015 07:10 PM, Jacob Pan wrote: On Fri, 6 Nov 2015 18:30:01 + Dietmar Eggemann wrote: On 05/11/15 10:12, Peter Zijlstra wrote: People, trim your emails! On Wed, Nov 04, 2015 at 08:58:30AM -0800, Jacob Pan wrote: I also like #2 too. Specially now that it is not limited to a specific platform. One question though, could you still keep the cooling device support of it? In some systems, it might make sense to enable / disable idle injections based on temperature. One of the key difference between 1 and 2 is that #2 is open loop control, since we don't have CPU c-states info baked into scheduler. _yet_, there's people working on that. The whole power aware scheduling stuff needs that. Isn't the idle state information (rq->idle_state) already used in find_idlest_cpu()? What we use in energy aware scheduling is quite similar but since we're interested in the index information of the c-state (to access the right element of the idle_state vectors of the energy model, we added rq->idle_state_idx. what i am interested is not per cpu idle state but rather at the package level or domain. It must be an indication for the overlapped idle time. Usually has to come from HW counters. I see. We have a similar problem with the Energy Model (EM) on cluster level (sched domain level DIE). We iterate over the cpus of a sched group and declare the shallowest cpu idle state as the cluster idle state to index our EM. On a typical ARM system we have (active, WFI, cpu-off and cluster-off). But I guess for you the idle state index is only for core idle states and you can't draw any conclusions from this for the package idle states. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On Fri, 06 Nov 2015 16:50:15 + Punit Agrawal wrote: > * idle injection once frequencies have been capped to the lowest > feasible values (as suggested in the cover letter) > actually, I was suggesting to start considering idle injection once frequency capped to the energy efficient point, which can be much higher than the lowest frequency. The idea being, deep idle power is negligible compared to running power which allows near linear power-perf scaling for balanced workload. Below energy efficient frequency, continuous lowering frequency may lose disproportion performance vs. power. i.e. worse than linear. > One question about the implementation in these patches - should the > implementation hook into pick_next_task in core instead of CFS? Higher > priority tasks might get in the way of idle injection. My take is that RT and throttling will never go well together since they are conflicting in principle. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On Fri, 6 Nov 2015 18:30:01 + Dietmar Eggemann wrote: > On 05/11/15 10:12, Peter Zijlstra wrote: > > > > People, trim your emails! > > > > On Wed, Nov 04, 2015 at 08:58:30AM -0800, Jacob Pan wrote: > > > >>> I also like #2 too. Specially now that it is not limited to a > >>> specific platform. One question though, could you still keep the > >>> cooling device support of it? In some systems, it might make > >>> sense to enable / disable idle injections based on temperature. > > > >> One of the key difference between 1 and 2 is that #2 is open loop > >> control, since we don't have CPU c-states info baked into > >> scheduler. > > > > _yet_, there's people working on that. The whole power aware > > scheduling stuff needs that. > > Isn't the idle state information (rq->idle_state) already used in > find_idlest_cpu()? > > What we use in energy aware scheduling is quite similar but since > we're interested in the index information of the c-state (to access > the right element of the idle_state vectors of the energy model, we > added rq->idle_state_idx. > what i am interested is not per cpu idle state but rather at the package level or domain. It must be an indication for the overlapped idle time. Usually has to come from HW counters. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On 05/11/15 10:12, Peter Zijlstra wrote: > > People, trim your emails! > > On Wed, Nov 04, 2015 at 08:58:30AM -0800, Jacob Pan wrote: > >>> I also like #2 too. Specially now that it is not limited to a specific >>> platform. One question though, could you still keep the cooling device >>> support of it? In some systems, it might make sense to enable / >>> disable idle injections based on temperature. > >> One of the key difference between 1 and 2 is that #2 is open loop >> control, since we don't have CPU c-states info baked into scheduler. > > _yet_, there's people working on that. The whole power aware scheduling > stuff needs that. Isn't the idle state information (rq->idle_state) already used in find_idlest_cpu()? What we use in energy aware scheduling is quite similar but since we're interested in the index information of the c-state (to access the right element of the idle_state vectors of the energy model, we added rq->idle_state_idx. [...] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
Peter Zijlstra writes: > People, trim your emails! > > On Wed, Nov 04, 2015 at 08:58:30AM -0800, Jacob Pan wrote: > >> > I also like #2 too. Specially now that it is not limited to a specific >> > platform. One question though, could you still keep the cooling device >> > support of it? In some systems, it might make sense to enable / >> > disable idle injections based on temperature. > >> One of the key difference between 1 and 2 is that #2 is open loop >> control, since we don't have CPU c-states info baked into scheduler. > > _yet_, there's people working on that. The whole power aware scheduling > stuff needs that. > >> To close the loop, perhaps we can export some internal APIs to the >> thermal subsystem then the thermal governors can pick the condition to >> inject idle. > > I would much rather that all be part of the power aware stuff, such that > the scheduler itself is aware of thermal limits and can migrate load > away if needed. I was wondering if we could use cpu capacity as the interface between the thermal sub-system and the scheduler. This would be better than dealing with frequency caps and idle injection percentages directly in the scheduler. We've been playing with making the scheduler respect capacity caps due to thermal constraints and have tasks migrated away to less capped cores. It would be great if in addition to the frequency caps, we could add idle injection to the arsenal. This would allow building policies on top such as - * pure idle injection where frequency capping is unsuitable (or unavailable) * a smooth continuum of capacities using a combination of frequency and capacity capping * idle injection once frequencies have been capped to the lowest feasible values (as suggested in the cover letter) One question about the implementation in these patches - should the implementation hook into pick_next_task in core instead of CFS? Higher priority tasks might get in the way of idle injection. > -- To unsubscribe from this list: send the line "unsubscribe > linux-kernel" in the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On Fri, 6 Nov 2015 18:30:01 + Dietmar Eggemannwrote: > On 05/11/15 10:12, Peter Zijlstra wrote: > > > > People, trim your emails! > > > > On Wed, Nov 04, 2015 at 08:58:30AM -0800, Jacob Pan wrote: > > > >>> I also like #2 too. Specially now that it is not limited to a > >>> specific platform. One question though, could you still keep the > >>> cooling device support of it? In some systems, it might make > >>> sense to enable / disable idle injections based on temperature. > > > >> One of the key difference between 1 and 2 is that #2 is open loop > >> control, since we don't have CPU c-states info baked into > >> scheduler. > > > > _yet_, there's people working on that. The whole power aware > > scheduling stuff needs that. > > Isn't the idle state information (rq->idle_state) already used in > find_idlest_cpu()? > > What we use in energy aware scheduling is quite similar but since > we're interested in the index information of the c-state (to access > the right element of the idle_state vectors of the energy model, we > added rq->idle_state_idx. > what i am interested is not per cpu idle state but rather at the package level or domain. It must be an indication for the overlapped idle time. Usually has to come from HW counters. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
Peter Zijlstrawrites: > People, trim your emails! > > On Wed, Nov 04, 2015 at 08:58:30AM -0800, Jacob Pan wrote: > >> > I also like #2 too. Specially now that it is not limited to a specific >> > platform. One question though, could you still keep the cooling device >> > support of it? In some systems, it might make sense to enable / >> > disable idle injections based on temperature. > >> One of the key difference between 1 and 2 is that #2 is open loop >> control, since we don't have CPU c-states info baked into scheduler. > > _yet_, there's people working on that. The whole power aware scheduling > stuff needs that. > >> To close the loop, perhaps we can export some internal APIs to the >> thermal subsystem then the thermal governors can pick the condition to >> inject idle. > > I would much rather that all be part of the power aware stuff, such that > the scheduler itself is aware of thermal limits and can migrate load > away if needed. I was wondering if we could use cpu capacity as the interface between the thermal sub-system and the scheduler. This would be better than dealing with frequency caps and idle injection percentages directly in the scheduler. We've been playing with making the scheduler respect capacity caps due to thermal constraints and have tasks migrated away to less capped cores. It would be great if in addition to the frequency caps, we could add idle injection to the arsenal. This would allow building policies on top such as - * pure idle injection where frequency capping is unsuitable (or unavailable) * a smooth continuum of capacities using a combination of frequency and capacity capping * idle injection once frequencies have been capped to the lowest feasible values (as suggested in the cover letter) One question about the implementation in these patches - should the implementation hook into pick_next_task in core instead of CFS? Higher priority tasks might get in the way of idle injection. > -- To unsubscribe from this list: send the line "unsubscribe > linux-kernel" in the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On 05/11/15 10:12, Peter Zijlstra wrote: > > People, trim your emails! > > On Wed, Nov 04, 2015 at 08:58:30AM -0800, Jacob Pan wrote: > >>> I also like #2 too. Specially now that it is not limited to a specific >>> platform. One question though, could you still keep the cooling device >>> support of it? In some systems, it might make sense to enable / >>> disable idle injections based on temperature. > >> One of the key difference between 1 and 2 is that #2 is open loop >> control, since we don't have CPU c-states info baked into scheduler. > > _yet_, there's people working on that. The whole power aware scheduling > stuff needs that. Isn't the idle state information (rq->idle_state) already used in find_idlest_cpu()? What we use in energy aware scheduling is quite similar but since we're interested in the index information of the c-state (to access the right element of the idle_state vectors of the energy model, we added rq->idle_state_idx. [...] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On Fri, 06 Nov 2015 16:50:15 + Punit Agrawalwrote: > * idle injection once frequencies have been capped to the lowest > feasible values (as suggested in the cover letter) > actually, I was suggesting to start considering idle injection once frequency capped to the energy efficient point, which can be much higher than the lowest frequency. The idea being, deep idle power is negligible compared to running power which allows near linear power-perf scaling for balanced workload. Below energy efficient frequency, continuous lowering frequency may lose disproportion performance vs. power. i.e. worse than linear. > One question about the implementation in these patches - should the > implementation hook into pick_next_task in core instead of CFS? Higher > priority tasks might get in the way of idle injection. My take is that RT and throttling will never go well together since they are conflicting in principle. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On 11/06/2015 07:10 PM, Jacob Pan wrote: On Fri, 6 Nov 2015 18:30:01 + Dietmar Eggemannwrote: On 05/11/15 10:12, Peter Zijlstra wrote: People, trim your emails! On Wed, Nov 04, 2015 at 08:58:30AM -0800, Jacob Pan wrote: I also like #2 too. Specially now that it is not limited to a specific platform. One question though, could you still keep the cooling device support of it? In some systems, it might make sense to enable / disable idle injections based on temperature. One of the key difference between 1 and 2 is that #2 is open loop control, since we don't have CPU c-states info baked into scheduler. _yet_, there's people working on that. The whole power aware scheduling stuff needs that. Isn't the idle state information (rq->idle_state) already used in find_idlest_cpu()? What we use in energy aware scheduling is quite similar but since we're interested in the index information of the c-state (to access the right element of the idle_state vectors of the energy model, we added rq->idle_state_idx. what i am interested is not per cpu idle state but rather at the package level or domain. It must be an indication for the overlapped idle time. Usually has to come from HW counters. I see. We have a similar problem with the Energy Model (EM) on cluster level (sched domain level DIE). We iterate over the cpus of a sched group and declare the shallowest cpu idle state as the cluster idle state to index our EM. On a typical ARM system we have (active, WFI, cpu-off and cluster-off). But I guess for you the idle state index is only for core idle states and you can't draw any conclusions from this for the package idle states. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
People, trim your emails! On Wed, Nov 04, 2015 at 08:58:30AM -0800, Jacob Pan wrote: > > I also like #2 too. Specially now that it is not limited to a specific > > platform. One question though, could you still keep the cooling device > > support of it? In some systems, it might make sense to enable / > > disable idle injections based on temperature. > One of the key difference between 1 and 2 is that #2 is open loop > control, since we don't have CPU c-states info baked into scheduler. _yet_, there's people working on that. The whole power aware scheduling stuff needs that. > To close the loop, perhaps we can export some internal APIs to the > thermal subsystem then the thermal governors can pick the condition to > inject idle. I would much rather that all be part of the power aware stuff, such that the scheduler itself is aware of thermal limits and can migrate load away if needed. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
People, trim your emails! On Wed, Nov 04, 2015 at 08:58:30AM -0800, Jacob Pan wrote: > > I also like #2 too. Specially now that it is not limited to a specific > > platform. One question though, could you still keep the cooling device > > support of it? In some systems, it might make sense to enable / > > disable idle injections based on temperature. > One of the key difference between 1 and 2 is that #2 is open loop > control, since we don't have CPU c-states info baked into scheduler. _yet_, there's people working on that. The whole power aware scheduling stuff needs that. > To close the loop, perhaps we can export some internal APIs to the > thermal subsystem then the thermal governors can pick the condition to > inject idle. I would much rather that all be part of the power aware stuff, such that the scheduler itself is aware of thermal limits and can migrate load away if needed. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
Hello Jacob, Srinivas, On Wed, Nov 04, 2015 at 09:05:52AM -0800, Srinivas Pandruvada wrote: > On Wed, 2015-11-04 at 08:58 -0800, Jacob Pan wrote: > > > > I have two choices for this code: > > > > 1) be part of existing powerclamp driver but require exporting some > > > >sched APIs. > > > > 2) be part of sched since the genernal rule applies when it comes > > > > down to sycnhronized idle time for best power savings. > > > > > > > > The patches below are for #2. There is a known problem with LOW RES > > > > timer mode that I am working on. But I am hoping to get review > > > > earlier. > > > > > > > > > > I also like #2 too. Specially now that it is not limited to a specific > > > platform. One question though, could you still keep the cooling device > > > support of it? In some systems, it might make sense to enable / > > > disable idle injections based on temperature. > > > > > One of the key difference between 1 and 2 is that #2 is open loop > > control, since we don't have CPU c-states info baked into scheduler. To > > close the loop, perhaps we can export some internal APIs to the thermal > > subsystem then the thermal governors can pick the condition to inject > > idle. Jacob, I also like this direction. Having the proper APIs exported, creating a cooling device that use them would be natural path. Then, one could create a thermal zone plugging a governor and the idle injection cooling device that uses the exported APIs. > > > Was there any particular reason you dropped the cooling device > > > support? > > > > > I did sysctl instead of thermal sysfs to conform the rest of the sched > > tuning knobs. We could also have a proxy cooling device to call > > internal APIs mentioned above. Agreed here then. > I think we should have cooling device as we are already using this > cooling device. Once it pass RFC stage,I think we should consider add > this. Srinivas, Yes, that seens to be a good path to follow. Thanks. > Thanks, > Srinivas > > > > Another reason is that, I intend to extend beyond thermal. Where we can > > consolidate/sync idle work in semi-active and balanced workload. I see. BR, Eduardo Valentin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On Wed, 2015-11-04 at 08:58 -0800, Jacob Pan wrote: > On Tue, 3 Nov 2015 22:06:55 -0800 > Eduardo Valentin wrote: > > > Hello Jacob, > > > > On Mon, Nov 02, 2015 at 04:10:25PM -0800, Jacob Pan wrote: > > > Hi Peter and all, > > > > > > A while ago, we had discussion about how powerclamp is broken in the > > > sense of turning off idle ticks in the forced idle period. > > > https://lkml.org/lkml/2014/12/18/369 > > > > > > It was suggested to replace the current kthread play idle loop with > > > a timer based runqueue throttling scheme. I finally got around to > > > implement this and code is much simpler. I also have good test > > > results in terms of efficiency, scalability, etc. > > > http://events.linuxfoundation.org/sites/events/files/slides/LinuxCon_Japan_2015_idle_injection1_0.pdf > > > slide #18+ shows the data on client and server. > > > > > > I have two choices for this code: > > > 1) be part of existing powerclamp driver but require exporting some > > >sched APIs. > > > 2) be part of sched since the genernal rule applies when it comes > > > down to sycnhronized idle time for best power savings. > > > > > > The patches below are for #2. There is a known problem with LOW RES > > > timer mode that I am working on. But I am hoping to get review > > > earlier. > > > > > > > I also like #2 too. Specially now that it is not limited to a specific > > platform. One question though, could you still keep the cooling device > > support of it? In some systems, it might make sense to enable / > > disable idle injections based on temperature. > > > One of the key difference between 1 and 2 is that #2 is open loop > control, since we don't have CPU c-states info baked into scheduler. To > close the loop, perhaps we can export some internal APIs to the thermal > subsystem then the thermal governors can pick the condition to inject > idle. > > Was there any particular reason you dropped the cooling device > > support? > > > I did sysctl instead of thermal sysfs to conform the rest of the sched > tuning knobs. We could also have a proxy cooling device to call > internal APIs mentioned above. I think we should have cooling device as we are already using this cooling device. Once it pass RFC stage,I think we should consider add this. Thanks, Srinivas > > Another reason is that, I intend to extend beyond thermal. Where we can > consolidate/sync idle work in semi-active and balanced workload. > > Thanks for the suggestions, > > Jacob > > BR, > > > > Eduardo Valentin > > > > > > > We are entering a very power limited environment on client side, > > > frequency scaling can only be efficient at certain range. e.g. on > > > SKL, upto ~900MHz, anything below, it is increasingly more > > > efficient to do C-states insertion if coordinated. > > > > > > Looking forward, there are use case beyond thermal/power capping. I > > > think we can consolidate ballanced partial busy workload that are > > > evenly distributed among CPUs. > > > > > > Please let me know what you think. > > > > > > Thanks, > > > > > > > > > Jacob Pan (3): > > > ktime: add a roundup function > > > timer: relax tick stop in idle entry > > > sched: introduce synchronized idle injection > > > > > > include/linux/ktime.h| 10 ++ > > > include/linux/sched.h| 12 ++ > > > include/linux/sched/sysctl.h | 5 + > > > include/trace/events/sched.h | 23 +++ > > > init/Kconfig | 8 + > > > kernel/sched/fair.c | 345 > > > +++ > > > kernel/sched/sched.h | 3 + kernel/sysctl.c > > > | 20 +++ kernel/time/tick-sched.c | 2 +- > > > 9 files changed, 427 insertions(+), 1 deletion(-) > > > > > > -- > > > 1.9.1 > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe > > > linux-kernel" in the body of a message to majord...@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > Please read the FAQ at http://www.tux.org/lkml/ > > [Jacob Pan] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On Tue, 3 Nov 2015 22:06:55 -0800 Eduardo Valentin wrote: > Hello Jacob, > > On Mon, Nov 02, 2015 at 04:10:25PM -0800, Jacob Pan wrote: > > Hi Peter and all, > > > > A while ago, we had discussion about how powerclamp is broken in the > > sense of turning off idle ticks in the forced idle period. > > https://lkml.org/lkml/2014/12/18/369 > > > > It was suggested to replace the current kthread play idle loop with > > a timer based runqueue throttling scheme. I finally got around to > > implement this and code is much simpler. I also have good test > > results in terms of efficiency, scalability, etc. > > http://events.linuxfoundation.org/sites/events/files/slides/LinuxCon_Japan_2015_idle_injection1_0.pdf > > slide #18+ shows the data on client and server. > > > > I have two choices for this code: > > 1) be part of existing powerclamp driver but require exporting some > >sched APIs. > > 2) be part of sched since the genernal rule applies when it comes > > down to sycnhronized idle time for best power savings. > > > > The patches below are for #2. There is a known problem with LOW RES > > timer mode that I am working on. But I am hoping to get review > > earlier. > > > > I also like #2 too. Specially now that it is not limited to a specific > platform. One question though, could you still keep the cooling device > support of it? In some systems, it might make sense to enable / > disable idle injections based on temperature. > One of the key difference between 1 and 2 is that #2 is open loop control, since we don't have CPU c-states info baked into scheduler. To close the loop, perhaps we can export some internal APIs to the thermal subsystem then the thermal governors can pick the condition to inject idle. > Was there any particular reason you dropped the cooling device > support? > I did sysctl instead of thermal sysfs to conform the rest of the sched tuning knobs. We could also have a proxy cooling device to call internal APIs mentioned above. Another reason is that, I intend to extend beyond thermal. Where we can consolidate/sync idle work in semi-active and balanced workload. Thanks for the suggestions, Jacob > BR, > > Eduardo Valentin > > > > We are entering a very power limited environment on client side, > > frequency scaling can only be efficient at certain range. e.g. on > > SKL, upto ~900MHz, anything below, it is increasingly more > > efficient to do C-states insertion if coordinated. > > > > Looking forward, there are use case beyond thermal/power capping. I > > think we can consolidate ballanced partial busy workload that are > > evenly distributed among CPUs. > > > > Please let me know what you think. > > > > Thanks, > > > > > > Jacob Pan (3): > > ktime: add a roundup function > > timer: relax tick stop in idle entry > > sched: introduce synchronized idle injection > > > > include/linux/ktime.h| 10 ++ > > include/linux/sched.h| 12 ++ > > include/linux/sched/sysctl.h | 5 + > > include/trace/events/sched.h | 23 +++ > > init/Kconfig | 8 + > > kernel/sched/fair.c | 345 > > +++ > > kernel/sched/sched.h | 3 + kernel/sysctl.c > > | 20 +++ kernel/time/tick-sched.c | 2 +- > > 9 files changed, 427 insertions(+), 1 deletion(-) > > > > -- > > 1.9.1 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe > > linux-kernel" in the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ [Jacob Pan] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On Tue, 3 Nov 2015 22:06:55 -0800 Eduardo Valentinwrote: > Hello Jacob, > > On Mon, Nov 02, 2015 at 04:10:25PM -0800, Jacob Pan wrote: > > Hi Peter and all, > > > > A while ago, we had discussion about how powerclamp is broken in the > > sense of turning off idle ticks in the forced idle period. > > https://lkml.org/lkml/2014/12/18/369 > > > > It was suggested to replace the current kthread play idle loop with > > a timer based runqueue throttling scheme. I finally got around to > > implement this and code is much simpler. I also have good test > > results in terms of efficiency, scalability, etc. > > http://events.linuxfoundation.org/sites/events/files/slides/LinuxCon_Japan_2015_idle_injection1_0.pdf > > slide #18+ shows the data on client and server. > > > > I have two choices for this code: > > 1) be part of existing powerclamp driver but require exporting some > >sched APIs. > > 2) be part of sched since the genernal rule applies when it comes > > down to sycnhronized idle time for best power savings. > > > > The patches below are for #2. There is a known problem with LOW RES > > timer mode that I am working on. But I am hoping to get review > > earlier. > > > > I also like #2 too. Specially now that it is not limited to a specific > platform. One question though, could you still keep the cooling device > support of it? In some systems, it might make sense to enable / > disable idle injections based on temperature. > One of the key difference between 1 and 2 is that #2 is open loop control, since we don't have CPU c-states info baked into scheduler. To close the loop, perhaps we can export some internal APIs to the thermal subsystem then the thermal governors can pick the condition to inject idle. > Was there any particular reason you dropped the cooling device > support? > I did sysctl instead of thermal sysfs to conform the rest of the sched tuning knobs. We could also have a proxy cooling device to call internal APIs mentioned above. Another reason is that, I intend to extend beyond thermal. Where we can consolidate/sync idle work in semi-active and balanced workload. Thanks for the suggestions, Jacob > BR, > > Eduardo Valentin > > > > We are entering a very power limited environment on client side, > > frequency scaling can only be efficient at certain range. e.g. on > > SKL, upto ~900MHz, anything below, it is increasingly more > > efficient to do C-states insertion if coordinated. > > > > Looking forward, there are use case beyond thermal/power capping. I > > think we can consolidate ballanced partial busy workload that are > > evenly distributed among CPUs. > > > > Please let me know what you think. > > > > Thanks, > > > > > > Jacob Pan (3): > > ktime: add a roundup function > > timer: relax tick stop in idle entry > > sched: introduce synchronized idle injection > > > > include/linux/ktime.h| 10 ++ > > include/linux/sched.h| 12 ++ > > include/linux/sched/sysctl.h | 5 + > > include/trace/events/sched.h | 23 +++ > > init/Kconfig | 8 + > > kernel/sched/fair.c | 345 > > +++ > > kernel/sched/sched.h | 3 + kernel/sysctl.c > > | 20 +++ kernel/time/tick-sched.c | 2 +- > > 9 files changed, 427 insertions(+), 1 deletion(-) > > > > -- > > 1.9.1 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe > > linux-kernel" in the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ [Jacob Pan] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
On Wed, 2015-11-04 at 08:58 -0800, Jacob Pan wrote: > On Tue, 3 Nov 2015 22:06:55 -0800 > Eduardo Valentinwrote: > > > Hello Jacob, > > > > On Mon, Nov 02, 2015 at 04:10:25PM -0800, Jacob Pan wrote: > > > Hi Peter and all, > > > > > > A while ago, we had discussion about how powerclamp is broken in the > > > sense of turning off idle ticks in the forced idle period. > > > https://lkml.org/lkml/2014/12/18/369 > > > > > > It was suggested to replace the current kthread play idle loop with > > > a timer based runqueue throttling scheme. I finally got around to > > > implement this and code is much simpler. I also have good test > > > results in terms of efficiency, scalability, etc. > > > http://events.linuxfoundation.org/sites/events/files/slides/LinuxCon_Japan_2015_idle_injection1_0.pdf > > > slide #18+ shows the data on client and server. > > > > > > I have two choices for this code: > > > 1) be part of existing powerclamp driver but require exporting some > > >sched APIs. > > > 2) be part of sched since the genernal rule applies when it comes > > > down to sycnhronized idle time for best power savings. > > > > > > The patches below are for #2. There is a known problem with LOW RES > > > timer mode that I am working on. But I am hoping to get review > > > earlier. > > > > > > > I also like #2 too. Specially now that it is not limited to a specific > > platform. One question though, could you still keep the cooling device > > support of it? In some systems, it might make sense to enable / > > disable idle injections based on temperature. > > > One of the key difference between 1 and 2 is that #2 is open loop > control, since we don't have CPU c-states info baked into scheduler. To > close the loop, perhaps we can export some internal APIs to the thermal > subsystem then the thermal governors can pick the condition to inject > idle. > > Was there any particular reason you dropped the cooling device > > support? > > > I did sysctl instead of thermal sysfs to conform the rest of the sched > tuning knobs. We could also have a proxy cooling device to call > internal APIs mentioned above. I think we should have cooling device as we are already using this cooling device. Once it pass RFC stage,I think we should consider add this. Thanks, Srinivas > > Another reason is that, I intend to extend beyond thermal. Where we can > consolidate/sync idle work in semi-active and balanced workload. > > Thanks for the suggestions, > > Jacob > > BR, > > > > Eduardo Valentin > > > > > > > We are entering a very power limited environment on client side, > > > frequency scaling can only be efficient at certain range. e.g. on > > > SKL, upto ~900MHz, anything below, it is increasingly more > > > efficient to do C-states insertion if coordinated. > > > > > > Looking forward, there are use case beyond thermal/power capping. I > > > think we can consolidate ballanced partial busy workload that are > > > evenly distributed among CPUs. > > > > > > Please let me know what you think. > > > > > > Thanks, > > > > > > > > > Jacob Pan (3): > > > ktime: add a roundup function > > > timer: relax tick stop in idle entry > > > sched: introduce synchronized idle injection > > > > > > include/linux/ktime.h| 10 ++ > > > include/linux/sched.h| 12 ++ > > > include/linux/sched/sysctl.h | 5 + > > > include/trace/events/sched.h | 23 +++ > > > init/Kconfig | 8 + > > > kernel/sched/fair.c | 345 > > > +++ > > > kernel/sched/sched.h | 3 + kernel/sysctl.c > > > | 20 +++ kernel/time/tick-sched.c | 2 +- > > > 9 files changed, 427 insertions(+), 1 deletion(-) > > > > > > -- > > > 1.9.1 > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe > > > linux-kernel" in the body of a message to majord...@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > Please read the FAQ at http://www.tux.org/lkml/ > > [Jacob Pan] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
Hello Jacob, Srinivas, On Wed, Nov 04, 2015 at 09:05:52AM -0800, Srinivas Pandruvada wrote: > On Wed, 2015-11-04 at 08:58 -0800, Jacob Pan wrote: > > > > I have two choices for this code: > > > > 1) be part of existing powerclamp driver but require exporting some > > > >sched APIs. > > > > 2) be part of sched since the genernal rule applies when it comes > > > > down to sycnhronized idle time for best power savings. > > > > > > > > The patches below are for #2. There is a known problem with LOW RES > > > > timer mode that I am working on. But I am hoping to get review > > > > earlier. > > > > > > > > > > I also like #2 too. Specially now that it is not limited to a specific > > > platform. One question though, could you still keep the cooling device > > > support of it? In some systems, it might make sense to enable / > > > disable idle injections based on temperature. > > > > > One of the key difference between 1 and 2 is that #2 is open loop > > control, since we don't have CPU c-states info baked into scheduler. To > > close the loop, perhaps we can export some internal APIs to the thermal > > subsystem then the thermal governors can pick the condition to inject > > idle. Jacob, I also like this direction. Having the proper APIs exported, creating a cooling device that use them would be natural path. Then, one could create a thermal zone plugging a governor and the idle injection cooling device that uses the exported APIs. > > > Was there any particular reason you dropped the cooling device > > > support? > > > > > I did sysctl instead of thermal sysfs to conform the rest of the sched > > tuning knobs. We could also have a proxy cooling device to call > > internal APIs mentioned above. Agreed here then. > I think we should have cooling device as we are already using this > cooling device. Once it pass RFC stage,I think we should consider add > this. Srinivas, Yes, that seens to be a good path to follow. Thanks. > Thanks, > Srinivas > > > > Another reason is that, I intend to extend beyond thermal. Where we can > > consolidate/sync idle work in semi-active and balanced workload. I see. BR, Eduardo Valentin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
Hello Jacob, On Mon, Nov 02, 2015 at 04:10:25PM -0800, Jacob Pan wrote: > Hi Peter and all, > > A while ago, we had discussion about how powerclamp is broken in the > sense of turning off idle ticks in the forced idle period. > https://lkml.org/lkml/2014/12/18/369 > > It was suggested to replace the current kthread play idle loop with a > timer based runqueue throttling scheme. I finally got around to implement > this and code is much simpler. I also have good test results in terms of > efficiency, scalability, etc. > http://events.linuxfoundation.org/sites/events/files/slides/LinuxCon_Japan_2015_idle_injection1_0.pdf > slide #18+ shows the data on client and server. > > I have two choices for this code: > 1) be part of existing powerclamp driver but require exporting some >sched APIs. > 2) be part of sched since the genernal rule applies when it comes down >to sycnhronized idle time for best power savings. > > The patches below are for #2. There is a known problem with LOW RES timer > mode that I am working on. But I am hoping to get review earlier. > I also like #2 too. Specially now that it is not limited to a specific platform. One question though, could you still keep the cooling device support of it? In some systems, it might make sense to enable / disable idle injections based on temperature. Was there any particular reason you dropped the cooling device support? BR, Eduardo Valentin > We are entering a very power limited environment on client side, frequency > scaling can only be efficient at certain range. e.g. on SKL, upto ~900MHz, > anything below, it is increasingly more efficient to do C-states insertion > if coordinated. > > Looking forward, there are use case beyond thermal/power capping. I think > we can consolidate ballanced partial busy workload that are evenly > distributed among CPUs. > > Please let me know what you think. > > Thanks, > > > Jacob Pan (3): > ktime: add a roundup function > timer: relax tick stop in idle entry > sched: introduce synchronized idle injection > > include/linux/ktime.h| 10 ++ > include/linux/sched.h| 12 ++ > include/linux/sched/sysctl.h | 5 + > include/trace/events/sched.h | 23 +++ > init/Kconfig | 8 + > kernel/sched/fair.c | 345 > +++ > kernel/sched/sched.h | 3 + > kernel/sysctl.c | 20 +++ > kernel/time/tick-sched.c | 2 +- > 9 files changed, 427 insertions(+), 1 deletion(-) > > -- > 1.9.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
Hello Jacob, On Mon, Nov 02, 2015 at 04:10:25PM -0800, Jacob Pan wrote: > Hi Peter and all, > > A while ago, we had discussion about how powerclamp is broken in the > sense of turning off idle ticks in the forced idle period. > https://lkml.org/lkml/2014/12/18/369 > > It was suggested to replace the current kthread play idle loop with a > timer based runqueue throttling scheme. I finally got around to implement > this and code is much simpler. I also have good test results in terms of > efficiency, scalability, etc. > http://events.linuxfoundation.org/sites/events/files/slides/LinuxCon_Japan_2015_idle_injection1_0.pdf > slide #18+ shows the data on client and server. > > I have two choices for this code: > 1) be part of existing powerclamp driver but require exporting some >sched APIs. > 2) be part of sched since the genernal rule applies when it comes down >to sycnhronized idle time for best power savings. > > The patches below are for #2. There is a known problem with LOW RES timer > mode that I am working on. But I am hoping to get review earlier. > I also like #2 too. Specially now that it is not limited to a specific platform. One question though, could you still keep the cooling device support of it? In some systems, it might make sense to enable / disable idle injections based on temperature. Was there any particular reason you dropped the cooling device support? BR, Eduardo Valentin > We are entering a very power limited environment on client side, frequency > scaling can only be efficient at certain range. e.g. on SKL, upto ~900MHz, > anything below, it is increasingly more efficient to do C-states insertion > if coordinated. > > Looking forward, there are use case beyond thermal/power capping. I think > we can consolidate ballanced partial busy workload that are evenly > distributed among CPUs. > > Please let me know what you think. > > Thanks, > > > Jacob Pan (3): > ktime: add a roundup function > timer: relax tick stop in idle entry > sched: introduce synchronized idle injection > > include/linux/ktime.h| 10 ++ > include/linux/sched.h| 12 ++ > include/linux/sched/sysctl.h | 5 + > include/trace/events/sched.h | 23 +++ > init/Kconfig | 8 + > kernel/sched/fair.c | 345 > +++ > kernel/sched/sched.h | 3 + > kernel/sysctl.c | 20 +++ > kernel/time/tick-sched.c | 2 +- > 9 files changed, 427 insertions(+), 1 deletion(-) > > -- > 1.9.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 0/3] CFS idle injection
Hi Peter and all, A while ago, we had discussion about how powerclamp is broken in the sense of turning off idle ticks in the forced idle period. https://lkml.org/lkml/2014/12/18/369 It was suggested to replace the current kthread play idle loop with a timer based runqueue throttling scheme. I finally got around to implement this and code is much simpler. I also have good test results in terms of efficiency, scalability, etc. http://events.linuxfoundation.org/sites/events/files/slides/LinuxCon_Japan_2015_idle_injection1_0.pdf slide #18+ shows the data on client and server. I have two choices for this code: 1) be part of existing powerclamp driver but require exporting some sched APIs. 2) be part of sched since the genernal rule applies when it comes down to sycnhronized idle time for best power savings. The patches below are for #2. There is a known problem with LOW RES timer mode that I am working on. But I am hoping to get review earlier. We are entering a very power limited environment on client side, frequency scaling can only be efficient at certain range. e.g. on SKL, upto ~900MHz, anything below, it is increasingly more efficient to do C-states insertion if coordinated. Looking forward, there are use case beyond thermal/power capping. I think we can consolidate ballanced partial busy workload that are evenly distributed among CPUs. Please let me know what you think. Thanks, Jacob Pan (3): ktime: add a roundup function timer: relax tick stop in idle entry sched: introduce synchronized idle injection include/linux/ktime.h| 10 ++ include/linux/sched.h| 12 ++ include/linux/sched/sysctl.h | 5 + include/trace/events/sched.h | 23 +++ init/Kconfig | 8 + kernel/sched/fair.c | 345 +++ kernel/sched/sched.h | 3 + kernel/sysctl.c | 20 +++ kernel/time/tick-sched.c | 2 +- 9 files changed, 427 insertions(+), 1 deletion(-) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 0/3] CFS idle injection
Hi Peter and all, A while ago, we had discussion about how powerclamp is broken in the sense of turning off idle ticks in the forced idle period. https://lkml.org/lkml/2014/12/18/369 It was suggested to replace the current kthread play idle loop with a timer based runqueue throttling scheme. I finally got around to implement this and code is much simpler. I also have good test results in terms of efficiency, scalability, etc. http://events.linuxfoundation.org/sites/events/files/slides/LinuxCon_Japan_2015_idle_injection1_0.pdf slide #18+ shows the data on client and server. I have two choices for this code: 1) be part of existing powerclamp driver but require exporting some sched APIs. 2) be part of sched since the genernal rule applies when it comes down to sycnhronized idle time for best power savings. The patches below are for #2. There is a known problem with LOW RES timer mode that I am working on. But I am hoping to get review earlier. We are entering a very power limited environment on client side, frequency scaling can only be efficient at certain range. e.g. on SKL, upto ~900MHz, anything below, it is increasingly more efficient to do C-states insertion if coordinated. Looking forward, there are use case beyond thermal/power capping. I think we can consolidate ballanced partial busy workload that are evenly distributed among CPUs. Please let me know what you think. Thanks, Jacob Pan (3): ktime: add a roundup function timer: relax tick stop in idle entry sched: introduce synchronized idle injection include/linux/ktime.h| 10 ++ include/linux/sched.h| 12 ++ include/linux/sched/sysctl.h | 5 + include/trace/events/sched.h | 23 +++ init/Kconfig | 8 + kernel/sched/fair.c | 345 +++ kernel/sched/sched.h | 3 + kernel/sysctl.c | 20 +++ kernel/time/tick-sched.c | 2 +- 9 files changed, 427 insertions(+), 1 deletion(-) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/