subject:"\[RFC PATCH 0\/3\] CFS idle injection"

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-10 Thread Juri Lelli

On 11/10/15, Peter Zijlstra wrote:
> On Tue, Nov 10, 2015 at 10:07:35AM +, Juri Lelli wrote:
> > Do you think that using SCHED_DEADLINE here would be completely
> > foolish? I mean, we would have the duty_cycle/period thing for free, it
> > would be know to the scheduler (as to maybe address Thomas' concerns)
> > and we could think to make idle injection part of system analysis (for
> > the soft-RT use cases).
> 
> DEADLINE would be awesome, but I think we need work on two fronts before
> we can really sell it as the awesome that it is ;-)
> 
>  - greedy and or statistical bounds
>  - !priv
> 
> The first is such that we can better deal with the erratic nature of
> media decode without going full worst case on it, and the second just
> makes it so much more accessible.
> 

Right. For the first point I think we just need to make our off-line
calculation right, not that we need to modify implementation. On the
second point we need more work yes. Also, another thing that is missing
is frequency (uarch) scaling for reservations parameters, something alike
what we are doing for CFS; this last point might be solved sooner :-).

Thanks,

- Juri

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-10 Thread Peter Zijlstra

On Tue, Nov 10, 2015 at 10:07:35AM +, Juri Lelli wrote:
> Do you think that using SCHED_DEADLINE here would be completely
> foolish? I mean, we would have the duty_cycle/period thing for free, it
> would be know to the scheduler (as to maybe address Thomas' concerns)
> and we could think to make idle injection part of system analysis (for
> the soft-RT use cases).

DEADLINE would be awesome, but I think we need work on two fronts before
we can really sell it as the awesome that it is ;-)

 - greedy and or statistical bounds
 - !priv

The first is such that we can better deal with the erratic nature of
media decode without going full worst case on it, and the second just
makes it so much more accessible.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-10 Thread Juri Lelli

Hi,

On 9 November 2015 at 14:15, Peter Zijlstra  wrote:
> On Mon, Nov 09, 2015 at 11:56:51AM +, Punit Agrawal wrote:
>> Jacob Pan  writes:
>> > My take is that RT and throttling will never go well together since they
>> > are conflicting in principle.
>>
>> I am not sure I follow. If RT (or other higher priority classes) can't
>> be throttled then the CPUs are not able to contribute towards
>> constraining power consumption and hence temperature.
>>
>> This is especially true in certain platforms where tasks belong to the
>> RT class to maintain user experience, e.g., audio and video.
>
> Audio/Video playback generally doesn't take a _lot_ of time these days.
> What is important though is _when_ it happens.
>
> And media playback typically already has a very well defined and stable
> cadence (24Hz or whatnot).  What you want is for your idle injector to
> sync up with that, not disrupt it.
>

Do you think that using SCHED_DEADLINE here would be completely
foolish? I mean, we would have the duty_cycle/period thing for free, it
would be know to the scheduler (as to maybe address Thomas' concerns)
and we could think to make idle injection part of system analysis (for
the soft-RT use cases).

Thanks,

- Juri

> For other workloads, missing a deadline is about as bad as destroying
> the chip, complete system shutdown might be safer than getting delayed.
> (The very tired scenario of a saw, a laser and your finger; you want to
> shut down the entire machine rather than just cut off your finger.)
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-10 Thread Juri Lelli

Hi,

On 9 November 2015 at 14:15, Peter Zijlstra  wrote:
> On Mon, Nov 09, 2015 at 11:56:51AM +, Punit Agrawal wrote:
>> Jacob Pan  writes:
>> > My take is that RT and throttling will never go well together since they
>> > are conflicting in principle.
>>
>> I am not sure I follow. If RT (or other higher priority classes) can't
>> be throttled then the CPUs are not able to contribute towards
>> constraining power consumption and hence temperature.
>>
>> This is especially true in certain platforms where tasks belong to the
>> RT class to maintain user experience, e.g., audio and video.
>
> Audio/Video playback generally doesn't take a _lot_ of time these days.
> What is important though is _when_ it happens.
>
> And media playback typically already has a very well defined and stable
> cadence (24Hz or whatnot).  What you want is for your idle injector to
> sync up with that, not disrupt it.
>

Do you think that using SCHED_DEADLINE here would be completely
foolish? I mean, we would have the duty_cycle/period thing for free, it
would be know to the scheduler (as to maybe address Thomas' concerns)
and we could think to make idle injection part of system analysis (for
the soft-RT use cases).

Thanks,

- Juri

> For other workloads, missing a deadline is about as bad as destroying
> the chip, complete system shutdown might be safer than getting delayed.
> (The very tired scenario of a saw, a laser and your finger; you want to
> shut down the entire machine rather than just cut off your finger.)
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-10 Thread Peter Zijlstra

On Tue, Nov 10, 2015 at 10:07:35AM +, Juri Lelli wrote:
> Do you think that using SCHED_DEADLINE here would be completely
> foolish? I mean, we would have the duty_cycle/period thing for free, it
> would be know to the scheduler (as to maybe address Thomas' concerns)
> and we could think to make idle injection part of system analysis (for
> the soft-RT use cases).

DEADLINE would be awesome, but I think we need work on two fronts before
we can really sell it as the awesome that it is ;-)

 - greedy and or statistical bounds
 - !priv

The first is such that we can better deal with the erratic nature of
media decode without going full worst case on it, and the second just
makes it so much more accessible.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-10 Thread Juri Lelli

On 11/10/15, Peter Zijlstra wrote:
> On Tue, Nov 10, 2015 at 10:07:35AM +, Juri Lelli wrote:
> > Do you think that using SCHED_DEADLINE here would be completely
> > foolish? I mean, we would have the duty_cycle/period thing for free, it
> > would be know to the scheduler (as to maybe address Thomas' concerns)
> > and we could think to make idle injection part of system analysis (for
> > the soft-RT use cases).
> 
> DEADLINE would be awesome, but I think we need work on two fronts before
> we can really sell it as the awesome that it is ;-)
> 
>  - greedy and or statistical bounds
>  - !priv
> 
> The first is such that we can better deal with the erratic nature of
> media decode without going full worst case on it, and the second just
> makes it so much more accessible.
> 

Right. For the first point I think we just need to make our off-line
calculation right, not that we need to modify implementation. On the
second point we need more work yes. Also, another thing that is missing
is frequency (uarch) scaling for reservations parameters, something alike
what we are doing for CFS; this last point might be solved sooner :-).

Thanks,

- Juri

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-09 Thread Peter Zijlstra

On Mon, Nov 09, 2015 at 01:23:04PM -0800, Jacob Pan wrote:
> what is WFI?

Wait For Interrupt; very like the x86 HLT thing.

> For Intel, idle states are hints to the HW. The FW decides how far the
> idle can go based on many factors, device states included, some are
> visible to the OS some are not. We just to help mature such deep idle
> conditions.

On some ARM you have to manually orchestrate cluster idle, which is
clustered idle states in cpuidle. The up-side is that you explicitly
know about them, the down side is that its cross CPU bits and a freak
show (think doing cross CPU atomics while a CPU isn't in the coherency
domain yet).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-09 Thread Jacob Pan

On Fri, 6 Nov 2015 21:55:49 +
Dietmar Eggemann  wrote:

> > what i am interested is not per cpu idle state but rather at the
> > package level or domain. It must be an indication for the
> > overlapped idle time. Usually has to come from HW counters.  
> 
> I see. We have a similar problem with the Energy Model (EM) on
> cluster level (sched domain level DIE). We iterate over the cpus of a
> sched group and declare the shallowest cpu idle state as the cluster
> idle state to index our EM. On a typical ARM system we have (active,
> WFI, cpu-off and cluster-off). But I guess for you the idle state
> index is only for core idle states and you can't draw any conclusions
> from this for the package idle states.
what is WFI?
For Intel, idle states are hints to the HW. The FW decides how far the
idle can go based on many factors, device states included, some are
visible to the OS some are not. We just to help mature such deep idle
conditions.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-09 Thread Jacob Pan

On Mon, 9 Nov 2015 15:15:34 +0100
Peter Zijlstra  wrote:

> On Mon, Nov 09, 2015 at 11:56:51AM +, Punit Agrawal wrote:
> > Jacob Pan  writes:
> > > My take is that RT and throttling will never go well together
> > > since they are conflicting in principle.
> > 
> > I am not sure I follow. If RT (or other higher priority classes)
> > can't be throttled then the CPUs are not able to contribute towards
> > constraining power consumption and hence temperature.
> > 
> > This is especially true in certain platforms where tasks belong to
> > the RT class to maintain user experience, e.g., audio and video. 
> 
> Audio/Video playback generally doesn't take a _lot_ of time these
> days. What is important though is _when_ it happens.
> 
> And media playback typically already has a very well defined and
> stable cadence (24Hz or whatnot).  What you want is for your idle
> injector to sync up with that, not disrupt it.
> 
Agreed, i have tested idle injection on video playback (mostly one cpu
busy, no sync with gpu), it does not do well to improve energy
efficiency. With the video playback being offloaded, there is no
thermal condition either. So outside the scope of this first patchst
trying to solve. The ability to sync with external pattern, could be the
next step. kind of like pll in hw :).

> For other workloads, missing a deadline is about as bad as destroying
> the chip, complete system shutdown might be safer than getting
> delayed. (The very tired scenario of a saw, a laser and your finger;
> you want to shut down the entire machine rather than just cut off
> your finger.)
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-09 Thread Jacob Pan

On Mon, 09 Nov 2015 11:56:51 +
Punit Agrawal  wrote:

> > actually, I was suggesting to start considering idle injection once
> > frequency capped to the energy efficient point, which can be much
> > higher than the lowest frequency. The idea being, deep idle power is
> > negligible compared to running power which allows near linear
> > power-perf scaling for balanced workload.
> > Below energy efficient frequency, continuous lowering frequency may
> > lose disproportion performance vs. power. i.e. worse than linear.
> >  
> 
> I agree. I was making that assumption that with the ability to inject
> idle states, there wouldn't be a need to expose the inefficient
> frequency states.
> 
> Do you still see a reason to do that?
yes, but it is up to a governor or management sw to decide when to to
pick what mechanism. there may be certain workload scale better with
frequency change. e.g. unbalanced workload, we don't want to inject
idle to all cpus if just one is busy. but it is also unlikely to run
into thermal issue in this case.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-09 Thread Peter Zijlstra

On Mon, Nov 09, 2015 at 11:56:51AM +, Punit Agrawal wrote:
> Jacob Pan  writes:
> > My take is that RT and throttling will never go well together since they
> > are conflicting in principle.
> 
> I am not sure I follow. If RT (or other higher priority classes) can't
> be throttled then the CPUs are not able to contribute towards
> constraining power consumption and hence temperature.
> 
> This is especially true in certain platforms where tasks belong to the
> RT class to maintain user experience, e.g., audio and video. 

Audio/Video playback generally doesn't take a _lot_ of time these days.
What is important though is _when_ it happens.

And media playback typically already has a very well defined and stable
cadence (24Hz or whatnot).  What you want is for your idle injector to
sync up with that, not disrupt it.

For other workloads, missing a deadline is about as bad as destroying
the chip, complete system shutdown might be safer than getting delayed.
(The very tired scenario of a saw, a laser and your finger; you want to
shut down the entire machine rather than just cut off your finger.)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-09 Thread Punit Agrawal

Jacob Pan  writes:

> On Fri, 06 Nov 2015 16:50:15 +
> Punit Agrawal  wrote:
>
>> * idle injection once frequencies have been capped to the lowest
>>   feasible values (as suggested in the cover letter)
>> 
> actually, I was suggesting to start considering idle injection once
> frequency capped to the energy efficient point, which can be much
> higher than the lowest frequency. The idea being, deep idle power is
> negligible compared to running power which allows near linear
> power-perf scaling for balanced workload.
> Below energy efficient frequency, continuous lowering frequency may
> lose disproportion performance vs. power. i.e. worse than linear.
>

I agree. I was making that assumption that with the ability to inject
idle states, there wouldn't be a need to expose the inefficient
frequency states.

Do you still see a reason to do that?

>> One question about the implementation in these patches - should the
>> implementation hook into pick_next_task in core instead of CFS? Higher
>> priority tasks might get in the way of idle injection.
> My take is that RT and throttling will never go well together since they
> are conflicting in principle.

I am not sure I follow. If RT (or other higher priority classes) can't
be throttled then the CPUs are not able to contribute towards
constraining power consumption and hence temperature.

This is especially true in certain platforms where tasks belong to the
RT class to maintain user experience, e.g., audio and video. 

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-09 Thread Jacob Pan

On Mon, 09 Nov 2015 11:56:51 +
Punit Agrawal  wrote:

> > actually, I was suggesting to start considering idle injection once
> > frequency capped to the energy efficient point, which can be much
> > higher than the lowest frequency. The idea being, deep idle power is
> > negligible compared to running power which allows near linear
> > power-perf scaling for balanced workload.
> > Below energy efficient frequency, continuous lowering frequency may
> > lose disproportion performance vs. power. i.e. worse than linear.
> >  
> 
> I agree. I was making that assumption that with the ability to inject
> idle states, there wouldn't be a need to expose the inefficient
> frequency states.
> 
> Do you still see a reason to do that?
yes, but it is up to a governor or management sw to decide when to to
pick what mechanism. there may be certain workload scale better with
frequency change. e.g. unbalanced workload, we don't want to inject
idle to all cpus if just one is busy. but it is also unlikely to run
into thermal issue in this case.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-09 Thread Jacob Pan

On Mon, 9 Nov 2015 15:15:34 +0100
Peter Zijlstra  wrote:

> On Mon, Nov 09, 2015 at 11:56:51AM +, Punit Agrawal wrote:
> > Jacob Pan  writes:
> > > My take is that RT and throttling will never go well together
> > > since they are conflicting in principle.
> > 
> > I am not sure I follow. If RT (or other higher priority classes)
> > can't be throttled then the CPUs are not able to contribute towards
> > constraining power consumption and hence temperature.
> > 
> > This is especially true in certain platforms where tasks belong to
> > the RT class to maintain user experience, e.g., audio and video. 
> 
> Audio/Video playback generally doesn't take a _lot_ of time these
> days. What is important though is _when_ it happens.
> 
> And media playback typically already has a very well defined and
> stable cadence (24Hz or whatnot).  What you want is for your idle
> injector to sync up with that, not disrupt it.
> 
Agreed, i have tested idle injection on video playback (mostly one cpu
busy, no sync with gpu), it does not do well to improve energy
efficiency. With the video playback being offloaded, there is no
thermal condition either. So outside the scope of this first patchst
trying to solve. The ability to sync with external pattern, could be the
next step. kind of like pll in hw :).

> For other workloads, missing a deadline is about as bad as destroying
> the chip, complete system shutdown might be safer than getting
> delayed. (The very tired scenario of a saw, a laser and your finger;
> you want to shut down the entire machine rather than just cut off
> your finger.)
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-09 Thread Peter Zijlstra

On Mon, Nov 09, 2015 at 11:56:51AM +, Punit Agrawal wrote:
> Jacob Pan  writes:
> > My take is that RT and throttling will never go well together since they
> > are conflicting in principle.
> 
> I am not sure I follow. If RT (or other higher priority classes) can't
> be throttled then the CPUs are not able to contribute towards
> constraining power consumption and hence temperature.
> 
> This is especially true in certain platforms where tasks belong to the
> RT class to maintain user experience, e.g., audio and video. 

Audio/Video playback generally doesn't take a _lot_ of time these days.
What is important though is _when_ it happens.

And media playback typically already has a very well defined and stable
cadence (24Hz or whatnot).  What you want is for your idle injector to
sync up with that, not disrupt it.

For other workloads, missing a deadline is about as bad as destroying
the chip, complete system shutdown might be safer than getting delayed.
(The very tired scenario of a saw, a laser and your finger; you want to
shut down the entire machine rather than just cut off your finger.)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-09 Thread Peter Zijlstra

On Mon, Nov 09, 2015 at 01:23:04PM -0800, Jacob Pan wrote:
> what is WFI?

Wait For Interrupt; very like the x86 HLT thing.

> For Intel, idle states are hints to the HW. The FW decides how far the
> idle can go based on many factors, device states included, some are
> visible to the OS some are not. We just to help mature such deep idle
> conditions.

On some ARM you have to manually orchestrate cluster idle, which is
clustered idle states in cpuidle. The up-side is that you explicitly
know about them, the down side is that its cross CPU bits and a freak
show (think doing cross CPU atomics while a CPU isn't in the coherency
domain yet).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-09 Thread Jacob Pan

On Fri, 6 Nov 2015 21:55:49 +
Dietmar Eggemann  wrote:

> > what i am interested is not per cpu idle state but rather at the
> > package level or domain. It must be an indication for the
> > overlapped idle time. Usually has to come from HW counters.  
> 
> I see. We have a similar problem with the Energy Model (EM) on
> cluster level (sched domain level DIE). We iterate over the cpus of a
> sched group and declare the shallowest cpu idle state as the cluster
> idle state to index our EM. On a typical ARM system we have (active,
> WFI, cpu-off and cluster-off). But I guess for you the idle state
> index is only for core idle states and you can't draw any conclusions
> from this for the package idle states.
what is WFI?
For Intel, idle states are hints to the HW. The FW decides how far the
idle can go based on many factors, device states included, some are
visible to the OS some are not. We just to help mature such deep idle
conditions.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-09 Thread Punit Agrawal

Jacob Pan  writes:

> On Fri, 06 Nov 2015 16:50:15 +
> Punit Agrawal  wrote:
>
>> * idle injection once frequencies have been capped to the lowest
>>   feasible values (as suggested in the cover letter)
>> 
> actually, I was suggesting to start considering idle injection once
> frequency capped to the energy efficient point, which can be much
> higher than the lowest frequency. The idea being, deep idle power is
> negligible compared to running power which allows near linear
> power-perf scaling for balanced workload.
> Below energy efficient frequency, continuous lowering frequency may
> lose disproportion performance vs. power. i.e. worse than linear.
>

I agree. I was making that assumption that with the ability to inject
idle states, there wouldn't be a need to expose the inefficient
frequency states.

Do you still see a reason to do that?

>> One question about the implementation in these patches - should the
>> implementation hook into pick_next_task in core instead of CFS? Higher
>> priority tasks might get in the way of idle injection.
> My take is that RT and throttling will never go well together since they
> are conflicting in principle.

I am not sure I follow. If RT (or other higher priority classes) can't
be throttled then the CPUs are not able to contribute towards
constraining power consumption and hence temperature.

This is especially true in certain platforms where tasks belong to the
RT class to maintain user experience, e.g., audio and video. 

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-06 Thread Dietmar Eggemann


On 11/06/2015 07:10 PM, Jacob Pan wrote:

On Fri, 6 Nov 2015 18:30:01 +
Dietmar Eggemann  wrote:


On 05/11/15 10:12, Peter Zijlstra wrote:


People, trim your emails!

On Wed, Nov 04, 2015 at 08:58:30AM -0800, Jacob Pan wrote:


I also like #2 too. Specially now that it is not limited to a
specific platform. One question though, could you still keep the
cooling device support of it? In some systems, it might make
sense to enable / disable idle injections based on temperature.



One of the key difference between 1 and 2 is that #2 is open loop
control, since we don't have CPU c-states info baked into
scheduler.


_yet_, there's people working on that. The whole power aware
scheduling stuff needs that.


Isn't the idle state information (rq->idle_state) already used in
find_idlest_cpu()?

What we use in energy aware scheduling is quite similar but since
we're interested in the index information of the c-state (to access
the right element of the idle_state vectors of the energy model, we
added rq->idle_state_idx.


what i am interested is not per cpu idle state but rather at the package
level or domain. It must be an indication for the overlapped idle time.
Usually has to come from HW counters.


I see. We have a similar problem with the Energy Model (EM) on cluster 
level (sched domain level DIE). We iterate over the cpus of a sched 
group and declare the shallowest cpu idle state as the cluster idle 
state to index our EM. On a typical ARM system we have (active, WFI, 
cpu-off and cluster-off). But I guess for you the idle state index is 
only for core idle states and you can't draw any conclusions from this 
for the package idle states.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-06 Thread Jacob Pan

On Fri, 06 Nov 2015 16:50:15 +
Punit Agrawal  wrote:

> * idle injection once frequencies have been capped to the lowest
>   feasible values (as suggested in the cover letter)
> 
actually, I was suggesting to start considering idle injection once
frequency capped to the energy efficient point, which can be much
higher than the lowest frequency. The idea being, deep idle power is
negligible compared to running power which allows near linear
power-perf scaling for balanced workload.
Below energy efficient frequency, continuous lowering frequency may
lose disproportion performance vs. power. i.e. worse than linear.

> One question about the implementation in these patches - should the
> implementation hook into pick_next_task in core instead of CFS? Higher
> priority tasks might get in the way of idle injection.
My take is that RT and throttling will never go well together since they
are conflicting in principle.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-06 Thread Jacob Pan

On Fri, 6 Nov 2015 18:30:01 +
Dietmar Eggemann  wrote:

> On 05/11/15 10:12, Peter Zijlstra wrote:
> > 
> > People, trim your emails!
> > 
> > On Wed, Nov 04, 2015 at 08:58:30AM -0800, Jacob Pan wrote:
> > 
> >>> I also like #2 too. Specially now that it is not limited to a
> >>> specific platform. One question though, could you still keep the
> >>> cooling device support of it? In some systems, it might make
> >>> sense to enable / disable idle injections based on temperature.
> > 
> >> One of the key difference between 1 and 2 is that #2 is open loop
> >> control, since we don't have CPU c-states info baked into
> >> scheduler. 
> > 
> > _yet_, there's people working on that. The whole power aware
> > scheduling stuff needs that.
> 
> Isn't the idle state information (rq->idle_state) already used in
> find_idlest_cpu()?
> 
> What we use in energy aware scheduling is quite similar but since
> we're interested in the index information of the c-state (to access
> the right element of the idle_state vectors of the energy model, we
> added rq->idle_state_idx.
> 
what i am interested is not per cpu idle state but rather at the package
level or domain. It must be an indication for the overlapped idle time.
Usually has to come from HW counters.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-06 Thread Dietmar Eggemann

On 05/11/15 10:12, Peter Zijlstra wrote:
> 
> People, trim your emails!
> 
> On Wed, Nov 04, 2015 at 08:58:30AM -0800, Jacob Pan wrote:
> 
>>> I also like #2 too. Specially now that it is not limited to a specific
>>> platform. One question though, could you still keep the cooling device
>>> support of it? In some systems, it might make sense to enable /
>>> disable idle injections based on temperature.
> 
>> One of the key difference between 1 and 2 is that #2 is open loop
>> control, since we don't have CPU c-states info baked into scheduler. 
> 
> _yet_, there's people working on that. The whole power aware scheduling
> stuff needs that.

Isn't the idle state information (rq->idle_state) already used in
find_idlest_cpu()?

What we use in energy aware scheduling is quite similar but since we're
interested in the index information of the c-state (to access the right
element of the idle_state vectors of the energy model, we added
rq->idle_state_idx.

[...]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-06 Thread Punit Agrawal

Peter Zijlstra  writes:

> People, trim your emails!
>
> On Wed, Nov 04, 2015 at 08:58:30AM -0800, Jacob Pan wrote:
>
>> > I also like #2 too. Specially now that it is not limited to a specific
>> > platform. One question though, could you still keep the cooling device
>> > support of it? In some systems, it might make sense to enable /
>> > disable idle injections based on temperature.
>
>> One of the key difference between 1 and 2 is that #2 is open loop
>> control, since we don't have CPU c-states info baked into scheduler. 
>
> _yet_, there's people working on that. The whole power aware scheduling
> stuff needs that.
>  
>> To close the loop, perhaps we can export some internal APIs to the
>> thermal subsystem then the thermal governors can pick the condition to
>> inject idle.
>
> I would much rather that all be part of the power aware stuff, such that
> the scheduler itself is aware of thermal limits and can migrate load
> away if needed.

I was wondering if we could use cpu capacity as the interface between
the thermal sub-system and the scheduler. This would be better than
dealing with frequency caps and idle injection percentages directly in
the scheduler.

We've been playing with making the scheduler respect capacity caps due
to thermal constraints and have tasks migrated away to less capped
cores.

It would be great if in addition to the frequency caps, we could add
idle injection to the arsenal. This would allow building policies on top
such as -

* pure idle injection where frequency capping is unsuitable (or
  unavailable)
* a smooth continuum of capacities using a combination of frequency and
  capacity capping
* idle injection once frequencies have been capped to the lowest
  feasible values (as suggested in the cover letter)

One question about the implementation in these patches - should the
implementation hook into pick_next_task in core instead of CFS? Higher
priority tasks might get in the way of idle injection.

> -- To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in the body of a message to majord...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-06 Thread Jacob Pan

On Fri, 6 Nov 2015 18:30:01 +
Dietmar Eggemann  wrote:

> On 05/11/15 10:12, Peter Zijlstra wrote:
> > 
> > People, trim your emails!
> > 
> > On Wed, Nov 04, 2015 at 08:58:30AM -0800, Jacob Pan wrote:
> > 
> >>> I also like #2 too. Specially now that it is not limited to a
> >>> specific platform. One question though, could you still keep the
> >>> cooling device support of it? In some systems, it might make
> >>> sense to enable / disable idle injections based on temperature.
> > 
> >> One of the key difference between 1 and 2 is that #2 is open loop
> >> control, since we don't have CPU c-states info baked into
> >> scheduler. 
> > 
> > _yet_, there's people working on that. The whole power aware
> > scheduling stuff needs that.
> 
> Isn't the idle state information (rq->idle_state) already used in
> find_idlest_cpu()?
> 
> What we use in energy aware scheduling is quite similar but since
> we're interested in the index information of the c-state (to access
> the right element of the idle_state vectors of the energy model, we
> added rq->idle_state_idx.
> 
what i am interested is not per cpu idle state but rather at the package
level or domain. It must be an indication for the overlapped idle time.
Usually has to come from HW counters.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-06 Thread Punit Agrawal

Peter Zijlstra  writes:

> People, trim your emails!
>
> On Wed, Nov 04, 2015 at 08:58:30AM -0800, Jacob Pan wrote:
>
>> > I also like #2 too. Specially now that it is not limited to a specific
>> > platform. One question though, could you still keep the cooling device
>> > support of it? In some systems, it might make sense to enable /
>> > disable idle injections based on temperature.
>
>> One of the key difference between 1 and 2 is that #2 is open loop
>> control, since we don't have CPU c-states info baked into scheduler. 
>
> _yet_, there's people working on that. The whole power aware scheduling
> stuff needs that.
>  
>> To close the loop, perhaps we can export some internal APIs to the
>> thermal subsystem then the thermal governors can pick the condition to
>> inject idle.
>
> I would much rather that all be part of the power aware stuff, such that
> the scheduler itself is aware of thermal limits and can migrate load
> away if needed.

I was wondering if we could use cpu capacity as the interface between
the thermal sub-system and the scheduler. This would be better than
dealing with frequency caps and idle injection percentages directly in
the scheduler.

We've been playing with making the scheduler respect capacity caps due
to thermal constraints and have tasks migrated away to less capped
cores.

It would be great if in addition to the frequency caps, we could add
idle injection to the arsenal. This would allow building policies on top
such as -

* pure idle injection where frequency capping is unsuitable (or
  unavailable)
* a smooth continuum of capacities using a combination of frequency and
  capacity capping
* idle injection once frequencies have been capped to the lowest
  feasible values (as suggested in the cover letter)

One question about the implementation in these patches - should the
implementation hook into pick_next_task in core instead of CFS? Higher
priority tasks might get in the way of idle injection.

> -- To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in the body of a message to majord...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-06 Thread Dietmar Eggemann

On 05/11/15 10:12, Peter Zijlstra wrote:
> 
> People, trim your emails!
> 
> On Wed, Nov 04, 2015 at 08:58:30AM -0800, Jacob Pan wrote:
> 
>>> I also like #2 too. Specially now that it is not limited to a specific
>>> platform. One question though, could you still keep the cooling device
>>> support of it? In some systems, it might make sense to enable /
>>> disable idle injections based on temperature.
> 
>> One of the key difference between 1 and 2 is that #2 is open loop
>> control, since we don't have CPU c-states info baked into scheduler. 
> 
> _yet_, there's people working on that. The whole power aware scheduling
> stuff needs that.

Isn't the idle state information (rq->idle_state) already used in
find_idlest_cpu()?

What we use in energy aware scheduling is quite similar but since we're
interested in the index information of the c-state (to access the right
element of the idle_state vectors of the energy model, we added
rq->idle_state_idx.

[...]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-06 Thread Jacob Pan

On Fri, 06 Nov 2015 16:50:15 +
Punit Agrawal  wrote:

> * idle injection once frequencies have been capped to the lowest
>   feasible values (as suggested in the cover letter)
> 
actually, I was suggesting to start considering idle injection once
frequency capped to the energy efficient point, which can be much
higher than the lowest frequency. The idea being, deep idle power is
negligible compared to running power which allows near linear
power-perf scaling for balanced workload.
Below energy efficient frequency, continuous lowering frequency may
lose disproportion performance vs. power. i.e. worse than linear.

> One question about the implementation in these patches - should the
> implementation hook into pick_next_task in core instead of CFS? Higher
> priority tasks might get in the way of idle injection.
My take is that RT and throttling will never go well together since they
are conflicting in principle.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-06 Thread Dietmar Eggemann


On 11/06/2015 07:10 PM, Jacob Pan wrote:

On Fri, 6 Nov 2015 18:30:01 +
Dietmar Eggemann  wrote:


On 05/11/15 10:12, Peter Zijlstra wrote:


People, trim your emails!

On Wed, Nov 04, 2015 at 08:58:30AM -0800, Jacob Pan wrote:


I also like #2 too. Specially now that it is not limited to a
specific platform. One question though, could you still keep the
cooling device support of it? In some systems, it might make
sense to enable / disable idle injections based on temperature.



One of the key difference between 1 and 2 is that #2 is open loop
control, since we don't have CPU c-states info baked into
scheduler.


_yet_, there's people working on that. The whole power aware
scheduling stuff needs that.


Isn't the idle state information (rq->idle_state) already used in
find_idlest_cpu()?

What we use in energy aware scheduling is quite similar but since
we're interested in the index information of the c-state (to access
the right element of the idle_state vectors of the energy model, we
added rq->idle_state_idx.


what i am interested is not per cpu idle state but rather at the package
level or domain. It must be an indication for the overlapped idle time.
Usually has to come from HW counters.


I see. We have a similar problem with the Energy Model (EM) on cluster 
level (sched domain level DIE). We iterate over the cpus of a sched 
group and declare the shallowest cpu idle state as the cluster idle 
state to index our EM. On a typical ARM system we have (active, WFI, 
cpu-off and cluster-off). But I guess for you the idle state index is 
only for core idle states and you can't draw any conclusions from this 
for the package idle states.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-05 Thread Peter Zijlstra


People, trim your emails!

On Wed, Nov 04, 2015 at 08:58:30AM -0800, Jacob Pan wrote:

> > I also like #2 too. Specially now that it is not limited to a specific
> > platform. One question though, could you still keep the cooling device
> > support of it? In some systems, it might make sense to enable /
> > disable idle injections based on temperature.

> One of the key difference between 1 and 2 is that #2 is open loop
> control, since we don't have CPU c-states info baked into scheduler. 

_yet_, there's people working on that. The whole power aware scheduling
stuff needs that.
 
> To close the loop, perhaps we can export some internal APIs to the
> thermal subsystem then the thermal governors can pick the condition to
> inject idle.

I would much rather that all be part of the power aware stuff, such that
the scheduler itself is aware of thermal limits and can migrate load
away if needed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-05 Thread Peter Zijlstra


People, trim your emails!

On Wed, Nov 04, 2015 at 08:58:30AM -0800, Jacob Pan wrote:

> > I also like #2 too. Specially now that it is not limited to a specific
> > platform. One question though, could you still keep the cooling device
> > support of it? In some systems, it might make sense to enable /
> > disable idle injections based on temperature.

> One of the key difference between 1 and 2 is that #2 is open loop
> control, since we don't have CPU c-states info baked into scheduler. 

_yet_, there's people working on that. The whole power aware scheduling
stuff needs that.
 
> To close the loop, perhaps we can export some internal APIs to the
> thermal subsystem then the thermal governors can pick the condition to
> inject idle.

I would much rather that all be part of the power aware stuff, such that
the scheduler itself is aware of thermal limits and can migrate load
away if needed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-04 Thread Eduardo Valentin

Hello Jacob, Srinivas,

On Wed, Nov 04, 2015 at 09:05:52AM -0800, Srinivas Pandruvada wrote:
> On Wed, 2015-11-04 at 08:58 -0800, Jacob Pan wrote:


> > > > I have two choices for this code:
> > > > 1) be part of existing powerclamp driver but require exporting some
> > > >sched APIs.
> > > > 2) be part of sched since the genernal rule applies when it comes
> > > > down to sycnhronized idle time for best power savings.
> > > > 
> > > > The patches below are for #2. There is a known problem with LOW RES
> > > > timer mode that I am working on. But I am hoping to get review
> > > > earlier.
> > > > 
> > > 
> > > I also like #2 too. Specially now that it is not limited to a specific
> > > platform. One question though, could you still keep the cooling device
> > > support of it? In some systems, it might make sense to enable /
> > > disable idle injections based on temperature.
> > > 
> > One of the key difference between 1 and 2 is that #2 is open loop
> > control, since we don't have CPU c-states info baked into scheduler. To
> > close the loop, perhaps we can export some internal APIs to the thermal
> > subsystem then the thermal governors can pick the condition to inject
> > idle.


Jacob,

I also like this direction. Having the proper APIs exported, creating a
cooling device that use them would be  natural path. Then, one could
create a thermal zone plugging a governor and the idle injection cooling
device that uses the exported APIs.

> > > Was there any particular reason you dropped the cooling device
> > > support?
> > > 
> > I did sysctl instead of thermal sysfs to conform the rest of the sched
> > tuning knobs. We could also have a proxy cooling device to call
> > internal APIs mentioned above.

Agreed here then.


> I think we should have cooling device as we are already using this
> cooling device. Once it pass RFC stage,I think we should consider add
> this.

Srinivas, 
Yes, that seens to be a good path to follow. Thanks.


> Thanks,
> Srinivas
> > 
> > Another reason is that, I intend to extend beyond thermal. Where we can
> > consolidate/sync idle work in semi-active and balanced workload.

I see. 

BR,

Eduardo Valentin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-04 Thread Srinivas Pandruvada

On Wed, 2015-11-04 at 08:58 -0800, Jacob Pan wrote:
> On Tue, 3 Nov 2015 22:06:55 -0800
> Eduardo Valentin  wrote:
> 
> > Hello Jacob,
> > 
> > On Mon, Nov 02, 2015 at 04:10:25PM -0800, Jacob Pan wrote:
> > > Hi Peter and all,
> > > 
> > > A while ago, we had discussion about how powerclamp is broken in the
> > > sense of turning off idle ticks in the forced idle period.
> > > https://lkml.org/lkml/2014/12/18/369
> > > 
> > > It was suggested to replace the current kthread play idle loop with
> > > a timer based runqueue throttling scheme. I finally got around to
> > > implement this and code is much simpler. I also have good test
> > > results in terms of efficiency, scalability, etc.
> > > http://events.linuxfoundation.org/sites/events/files/slides/LinuxCon_Japan_2015_idle_injection1_0.pdf
> > > slide #18+ shows the data on client and server.
> > > 
> > > I have two choices for this code:
> > > 1) be part of existing powerclamp driver but require exporting some
> > >sched APIs.
> > > 2) be part of sched since the genernal rule applies when it comes
> > > down to sycnhronized idle time for best power savings.
> > > 
> > > The patches below are for #2. There is a known problem with LOW RES
> > > timer mode that I am working on. But I am hoping to get review
> > > earlier.
> > > 
> > 
> > I also like #2 too. Specially now that it is not limited to a specific
> > platform. One question though, could you still keep the cooling device
> > support of it? In some systems, it might make sense to enable /
> > disable idle injections based on temperature.
> > 
> One of the key difference between 1 and 2 is that #2 is open loop
> control, since we don't have CPU c-states info baked into scheduler. To
> close the loop, perhaps we can export some internal APIs to the thermal
> subsystem then the thermal governors can pick the condition to inject
> idle.
> > Was there any particular reason you dropped the cooling device
> > support?
> > 
> I did sysctl instead of thermal sysfs to conform the rest of the sched
> tuning knobs. We could also have a proxy cooling device to call
> internal APIs mentioned above.
I think we should have cooling device as we are already using this
cooling device. Once it pass RFC stage,I think we should consider add
this.
Thanks,
Srinivas
> 
> Another reason is that, I intend to extend beyond thermal. Where we can
> consolidate/sync idle work in semi-active and balanced workload.
> 
> Thanks for the suggestions,
> 
> Jacob
> > BR,
> > 
> > Eduardo Valentin
> > 
> > 
> > > We are entering a very power limited environment on client side,
> > > frequency scaling can only be efficient at certain range. e.g. on
> > > SKL, upto ~900MHz, anything below, it is increasingly more
> > > efficient to do C-states insertion if coordinated.
> > > 
> > > Looking forward, there are use case beyond thermal/power capping. I
> > > think we can consolidate ballanced partial busy workload that are
> > > evenly distributed among CPUs.
> > > 
> > > Please let me know what you think.
> > > 
> > > Thanks,
> > > 
> > > 
> > > Jacob Pan (3):
> > >   ktime: add a roundup function
> > >   timer: relax tick stop in idle entry
> > >   sched: introduce synchronized idle injection
> > > 
> > >  include/linux/ktime.h|  10 ++
> > >  include/linux/sched.h|  12 ++
> > >  include/linux/sched/sysctl.h |   5 +
> > >  include/trace/events/sched.h |  23 +++
> > >  init/Kconfig |   8 +
> > >  kernel/sched/fair.c  | 345
> > > +++
> > > kernel/sched/sched.h |   3 + kernel/sysctl.c
> > > |  20 +++ kernel/time/tick-sched.c |   2 +-
> > >  9 files changed, 427 insertions(+), 1 deletion(-)
> > > 
> > > -- 
> > > 1.9.1
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe
> > > linux-kernel" in the body of a message to majord...@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at  http://www.tux.org/lkml/
> 
> [Jacob Pan]


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-04 Thread Jacob Pan

On Tue, 3 Nov 2015 22:06:55 -0800
Eduardo Valentin  wrote:

> Hello Jacob,
> 
> On Mon, Nov 02, 2015 at 04:10:25PM -0800, Jacob Pan wrote:
> > Hi Peter and all,
> > 
> > A while ago, we had discussion about how powerclamp is broken in the
> > sense of turning off idle ticks in the forced idle period.
> > https://lkml.org/lkml/2014/12/18/369
> > 
> > It was suggested to replace the current kthread play idle loop with
> > a timer based runqueue throttling scheme. I finally got around to
> > implement this and code is much simpler. I also have good test
> > results in terms of efficiency, scalability, etc.
> > http://events.linuxfoundation.org/sites/events/files/slides/LinuxCon_Japan_2015_idle_injection1_0.pdf
> > slide #18+ shows the data on client and server.
> > 
> > I have two choices for this code:
> > 1) be part of existing powerclamp driver but require exporting some
> >sched APIs.
> > 2) be part of sched since the genernal rule applies when it comes
> > down to sycnhronized idle time for best power savings.
> > 
> > The patches below are for #2. There is a known problem with LOW RES
> > timer mode that I am working on. But I am hoping to get review
> > earlier.
> > 
> 
> I also like #2 too. Specially now that it is not limited to a specific
> platform. One question though, could you still keep the cooling device
> support of it? In some systems, it might make sense to enable /
> disable idle injections based on temperature.
> 
One of the key difference between 1 and 2 is that #2 is open loop
control, since we don't have CPU c-states info baked into scheduler. To
close the loop, perhaps we can export some internal APIs to the thermal
subsystem then the thermal governors can pick the condition to inject
idle.
> Was there any particular reason you dropped the cooling device
> support?
> 
I did sysctl instead of thermal sysfs to conform the rest of the sched
tuning knobs. We could also have a proxy cooling device to call
internal APIs mentioned above.

Another reason is that, I intend to extend beyond thermal. Where we can
consolidate/sync idle work in semi-active and balanced workload.

Thanks for the suggestions,

Jacob
> BR,
> 
> Eduardo Valentin
> 
> 
> > We are entering a very power limited environment on client side,
> > frequency scaling can only be efficient at certain range. e.g. on
> > SKL, upto ~900MHz, anything below, it is increasingly more
> > efficient to do C-states insertion if coordinated.
> > 
> > Looking forward, there are use case beyond thermal/power capping. I
> > think we can consolidate ballanced partial busy workload that are
> > evenly distributed among CPUs.
> > 
> > Please let me know what you think.
> > 
> > Thanks,
> > 
> > 
> > Jacob Pan (3):
> >   ktime: add a roundup function
> >   timer: relax tick stop in idle entry
> >   sched: introduce synchronized idle injection
> > 
> >  include/linux/ktime.h|  10 ++
> >  include/linux/sched.h|  12 ++
> >  include/linux/sched/sysctl.h |   5 +
> >  include/trace/events/sched.h |  23 +++
> >  init/Kconfig |   8 +
> >  kernel/sched/fair.c  | 345
> > +++
> > kernel/sched/sched.h |   3 + kernel/sysctl.c
> > |  20 +++ kernel/time/tick-sched.c |   2 +-
> >  9 files changed, 427 insertions(+), 1 deletion(-)
> > 
> > -- 
> > 1.9.1
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-kernel" in the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/

[Jacob Pan]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-04 Thread Jacob Pan

On Tue, 3 Nov 2015 22:06:55 -0800
Eduardo Valentin  wrote:

> Hello Jacob,
> 
> On Mon, Nov 02, 2015 at 04:10:25PM -0800, Jacob Pan wrote:
> > Hi Peter and all,
> > 
> > A while ago, we had discussion about how powerclamp is broken in the
> > sense of turning off idle ticks in the forced idle period.
> > https://lkml.org/lkml/2014/12/18/369
> > 
> > It was suggested to replace the current kthread play idle loop with
> > a timer based runqueue throttling scheme. I finally got around to
> > implement this and code is much simpler. I also have good test
> > results in terms of efficiency, scalability, etc.
> > http://events.linuxfoundation.org/sites/events/files/slides/LinuxCon_Japan_2015_idle_injection1_0.pdf
> > slide #18+ shows the data on client and server.
> > 
> > I have two choices for this code:
> > 1) be part of existing powerclamp driver but require exporting some
> >sched APIs.
> > 2) be part of sched since the genernal rule applies when it comes
> > down to sycnhronized idle time for best power savings.
> > 
> > The patches below are for #2. There is a known problem with LOW RES
> > timer mode that I am working on. But I am hoping to get review
> > earlier.
> > 
> 
> I also like #2 too. Specially now that it is not limited to a specific
> platform. One question though, could you still keep the cooling device
> support of it? In some systems, it might make sense to enable /
> disable idle injections based on temperature.
> 
One of the key difference between 1 and 2 is that #2 is open loop
control, since we don't have CPU c-states info baked into scheduler. To
close the loop, perhaps we can export some internal APIs to the thermal
subsystem then the thermal governors can pick the condition to inject
idle.
> Was there any particular reason you dropped the cooling device
> support?
> 
I did sysctl instead of thermal sysfs to conform the rest of the sched
tuning knobs. We could also have a proxy cooling device to call
internal APIs mentioned above.

Another reason is that, I intend to extend beyond thermal. Where we can
consolidate/sync idle work in semi-active and balanced workload.

Thanks for the suggestions,

Jacob
> BR,
> 
> Eduardo Valentin
> 
> 
> > We are entering a very power limited environment on client side,
> > frequency scaling can only be efficient at certain range. e.g. on
> > SKL, upto ~900MHz, anything below, it is increasingly more
> > efficient to do C-states insertion if coordinated.
> > 
> > Looking forward, there are use case beyond thermal/power capping. I
> > think we can consolidate ballanced partial busy workload that are
> > evenly distributed among CPUs.
> > 
> > Please let me know what you think.
> > 
> > Thanks,
> > 
> > 
> > Jacob Pan (3):
> >   ktime: add a roundup function
> >   timer: relax tick stop in idle entry
> >   sched: introduce synchronized idle injection
> > 
> >  include/linux/ktime.h|  10 ++
> >  include/linux/sched.h|  12 ++
> >  include/linux/sched/sysctl.h |   5 +
> >  include/trace/events/sched.h |  23 +++
> >  init/Kconfig |   8 +
> >  kernel/sched/fair.c  | 345
> > +++
> > kernel/sched/sched.h |   3 + kernel/sysctl.c
> > |  20 +++ kernel/time/tick-sched.c |   2 +-
> >  9 files changed, 427 insertions(+), 1 deletion(-)
> > 
> > -- 
> > 1.9.1
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-kernel" in the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/

[Jacob Pan]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-04 Thread Srinivas Pandruvada

On Wed, 2015-11-04 at 08:58 -0800, Jacob Pan wrote:
> On Tue, 3 Nov 2015 22:06:55 -0800
> Eduardo Valentin  wrote:
> 
> > Hello Jacob,
> > 
> > On Mon, Nov 02, 2015 at 04:10:25PM -0800, Jacob Pan wrote:
> > > Hi Peter and all,
> > > 
> > > A while ago, we had discussion about how powerclamp is broken in the
> > > sense of turning off idle ticks in the forced idle period.
> > > https://lkml.org/lkml/2014/12/18/369
> > > 
> > > It was suggested to replace the current kthread play idle loop with
> > > a timer based runqueue throttling scheme. I finally got around to
> > > implement this and code is much simpler. I also have good test
> > > results in terms of efficiency, scalability, etc.
> > > http://events.linuxfoundation.org/sites/events/files/slides/LinuxCon_Japan_2015_idle_injection1_0.pdf
> > > slide #18+ shows the data on client and server.
> > > 
> > > I have two choices for this code:
> > > 1) be part of existing powerclamp driver but require exporting some
> > >sched APIs.
> > > 2) be part of sched since the genernal rule applies when it comes
> > > down to sycnhronized idle time for best power savings.
> > > 
> > > The patches below are for #2. There is a known problem with LOW RES
> > > timer mode that I am working on. But I am hoping to get review
> > > earlier.
> > > 
> > 
> > I also like #2 too. Specially now that it is not limited to a specific
> > platform. One question though, could you still keep the cooling device
> > support of it? In some systems, it might make sense to enable /
> > disable idle injections based on temperature.
> > 
> One of the key difference between 1 and 2 is that #2 is open loop
> control, since we don't have CPU c-states info baked into scheduler. To
> close the loop, perhaps we can export some internal APIs to the thermal
> subsystem then the thermal governors can pick the condition to inject
> idle.
> > Was there any particular reason you dropped the cooling device
> > support?
> > 
> I did sysctl instead of thermal sysfs to conform the rest of the sched
> tuning knobs. We could also have a proxy cooling device to call
> internal APIs mentioned above.
I think we should have cooling device as we are already using this
cooling device. Once it pass RFC stage,I think we should consider add
this.
Thanks,
Srinivas
> 
> Another reason is that, I intend to extend beyond thermal. Where we can
> consolidate/sync idle work in semi-active and balanced workload.
> 
> Thanks for the suggestions,
> 
> Jacob
> > BR,
> > 
> > Eduardo Valentin
> > 
> > 
> > > We are entering a very power limited environment on client side,
> > > frequency scaling can only be efficient at certain range. e.g. on
> > > SKL, upto ~900MHz, anything below, it is increasingly more
> > > efficient to do C-states insertion if coordinated.
> > > 
> > > Looking forward, there are use case beyond thermal/power capping. I
> > > think we can consolidate ballanced partial busy workload that are
> > > evenly distributed among CPUs.
> > > 
> > > Please let me know what you think.
> > > 
> > > Thanks,
> > > 
> > > 
> > > Jacob Pan (3):
> > >   ktime: add a roundup function
> > >   timer: relax tick stop in idle entry
> > >   sched: introduce synchronized idle injection
> > > 
> > >  include/linux/ktime.h|  10 ++
> > >  include/linux/sched.h|  12 ++
> > >  include/linux/sched/sysctl.h |   5 +
> > >  include/trace/events/sched.h |  23 +++
> > >  init/Kconfig |   8 +
> > >  kernel/sched/fair.c  | 345
> > > +++
> > > kernel/sched/sched.h |   3 + kernel/sysctl.c
> > > |  20 +++ kernel/time/tick-sched.c |   2 +-
> > >  9 files changed, 427 insertions(+), 1 deletion(-)
> > > 
> > > -- 
> > > 1.9.1
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe
> > > linux-kernel" in the body of a message to majord...@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at  http://www.tux.org/lkml/
> 
> [Jacob Pan]


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-04 Thread Eduardo Valentin

Hello Jacob, Srinivas,

On Wed, Nov 04, 2015 at 09:05:52AM -0800, Srinivas Pandruvada wrote:
> On Wed, 2015-11-04 at 08:58 -0800, Jacob Pan wrote:


> > > > I have two choices for this code:
> > > > 1) be part of existing powerclamp driver but require exporting some
> > > >sched APIs.
> > > > 2) be part of sched since the genernal rule applies when it comes
> > > > down to sycnhronized idle time for best power savings.
> > > > 
> > > > The patches below are for #2. There is a known problem with LOW RES
> > > > timer mode that I am working on. But I am hoping to get review
> > > > earlier.
> > > > 
> > > 
> > > I also like #2 too. Specially now that it is not limited to a specific
> > > platform. One question though, could you still keep the cooling device
> > > support of it? In some systems, it might make sense to enable /
> > > disable idle injections based on temperature.
> > > 
> > One of the key difference between 1 and 2 is that #2 is open loop
> > control, since we don't have CPU c-states info baked into scheduler. To
> > close the loop, perhaps we can export some internal APIs to the thermal
> > subsystem then the thermal governors can pick the condition to inject
> > idle.


Jacob,

I also like this direction. Having the proper APIs exported, creating a
cooling device that use them would be  natural path. Then, one could
create a thermal zone plugging a governor and the idle injection cooling
device that uses the exported APIs.

> > > Was there any particular reason you dropped the cooling device
> > > support?
> > > 
> > I did sysctl instead of thermal sysfs to conform the rest of the sched
> > tuning knobs. We could also have a proxy cooling device to call
> > internal APIs mentioned above.

Agreed here then.


> I think we should have cooling device as we are already using this
> cooling device. Once it pass RFC stage,I think we should consider add
> this.

Srinivas, 
Yes, that seens to be a good path to follow. Thanks.


> Thanks,
> Srinivas
> > 
> > Another reason is that, I intend to extend beyond thermal. Where we can
> > consolidate/sync idle work in semi-active and balanced workload.

I see. 

BR,

Eduardo Valentin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-03 Thread Eduardo Valentin

Hello Jacob,

On Mon, Nov 02, 2015 at 04:10:25PM -0800, Jacob Pan wrote:
> Hi Peter and all,
> 
> A while ago, we had discussion about how powerclamp is broken in the
> sense of turning off idle ticks in the forced idle period.
> https://lkml.org/lkml/2014/12/18/369
> 
> It was suggested to replace the current kthread play idle loop with a
> timer based runqueue throttling scheme. I finally got around to implement
> this and code is much simpler. I also have good test results in terms of
> efficiency, scalability, etc.
> http://events.linuxfoundation.org/sites/events/files/slides/LinuxCon_Japan_2015_idle_injection1_0.pdf
> slide #18+ shows the data on client and server.
> 
> I have two choices for this code:
> 1) be part of existing powerclamp driver but require exporting some
>sched APIs.
> 2) be part of sched since the genernal rule applies when it comes down
>to sycnhronized idle time for best power savings.
> 
> The patches below are for #2. There is a known problem with LOW RES timer
> mode that I am working on. But I am hoping to get review earlier.
> 

I also like #2 too. Specially now that it is not limited to a specific
platform. One question though, could you still keep the cooling device
support of it? In some systems, it might make sense to enable / disable
idle injections based on temperature.

Was there any particular reason you dropped the cooling device support?

BR,

Eduardo Valentin


> We are entering a very power limited environment on client side, frequency
> scaling can only be efficient at certain range. e.g. on SKL, upto ~900MHz,
> anything below, it is increasingly more efficient to do C-states insertion
> if coordinated.
> 
> Looking forward, there are use case beyond thermal/power capping. I think
> we can consolidate ballanced partial busy workload that are evenly
> distributed among CPUs.
> 
> Please let me know what you think.
> 
> Thanks,
> 
> 
> Jacob Pan (3):
>   ktime: add a roundup function
>   timer: relax tick stop in idle entry
>   sched: introduce synchronized idle injection
> 
>  include/linux/ktime.h|  10 ++
>  include/linux/sched.h|  12 ++
>  include/linux/sched/sysctl.h |   5 +
>  include/trace/events/sched.h |  23 +++
>  init/Kconfig |   8 +
>  kernel/sched/fair.c  | 345 
> +++
>  kernel/sched/sched.h |   3 +
>  kernel/sysctl.c  |  20 +++
>  kernel/time/tick-sched.c |   2 +-
>  9 files changed, 427 insertions(+), 1 deletion(-)
> 
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] CFS idle injection

2015-11-03 Thread Eduardo Valentin

Hello Jacob,

On Mon, Nov 02, 2015 at 04:10:25PM -0800, Jacob Pan wrote:
> Hi Peter and all,
> 
> A while ago, we had discussion about how powerclamp is broken in the
> sense of turning off idle ticks in the forced idle period.
> https://lkml.org/lkml/2014/12/18/369
> 
> It was suggested to replace the current kthread play idle loop with a
> timer based runqueue throttling scheme. I finally got around to implement
> this and code is much simpler. I also have good test results in terms of
> efficiency, scalability, etc.
> http://events.linuxfoundation.org/sites/events/files/slides/LinuxCon_Japan_2015_idle_injection1_0.pdf
> slide #18+ shows the data on client and server.
> 
> I have two choices for this code:
> 1) be part of existing powerclamp driver but require exporting some
>sched APIs.
> 2) be part of sched since the genernal rule applies when it comes down
>to sycnhronized idle time for best power savings.
> 
> The patches below are for #2. There is a known problem with LOW RES timer
> mode that I am working on. But I am hoping to get review earlier.
> 

I also like #2 too. Specially now that it is not limited to a specific
platform. One question though, could you still keep the cooling device
support of it? In some systems, it might make sense to enable / disable
idle injections based on temperature.

Was there any particular reason you dropped the cooling device support?

BR,

Eduardo Valentin


> We are entering a very power limited environment on client side, frequency
> scaling can only be efficient at certain range. e.g. on SKL, upto ~900MHz,
> anything below, it is increasingly more efficient to do C-states insertion
> if coordinated.
> 
> Looking forward, there are use case beyond thermal/power capping. I think
> we can consolidate ballanced partial busy workload that are evenly
> distributed among CPUs.
> 
> Please let me know what you think.
> 
> Thanks,
> 
> 
> Jacob Pan (3):
>   ktime: add a roundup function
>   timer: relax tick stop in idle entry
>   sched: introduce synchronized idle injection
> 
>  include/linux/ktime.h|  10 ++
>  include/linux/sched.h|  12 ++
>  include/linux/sched/sysctl.h |   5 +
>  include/trace/events/sched.h |  23 +++
>  init/Kconfig |   8 +
>  kernel/sched/fair.c  | 345 
> +++
>  kernel/sched/sched.h |   3 +
>  kernel/sysctl.c  |  20 +++
>  kernel/time/tick-sched.c |   2 +-
>  9 files changed, 427 insertions(+), 1 deletion(-)
> 
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 0/3] CFS idle injection

2015-11-02 Thread Jacob Pan

Hi Peter and all,

A while ago, we had discussion about how powerclamp is broken in the
sense of turning off idle ticks in the forced idle period.
https://lkml.org/lkml/2014/12/18/369

It was suggested to replace the current kthread play idle loop with a
timer based runqueue throttling scheme. I finally got around to implement
this and code is much simpler. I also have good test results in terms of
efficiency, scalability, etc.
http://events.linuxfoundation.org/sites/events/files/slides/LinuxCon_Japan_2015_idle_injection1_0.pdf
slide #18+ shows the data on client and server.

I have two choices for this code:
1) be part of existing powerclamp driver but require exporting some
sched APIs.
2) be part of sched since the genernal rule applies when it comes down
to sycnhronized idle time for best power savings.

The patches below are for #2. There is a known problem with LOW RES timer
mode that I am working on. But I am hoping to get review earlier.

We are entering a very power limited environment on client side, frequency
scaling can only be efficient at certain range. e.g. on SKL, upto ~900MHz,
anything below, it is increasingly more efficient to do C-states insertion
if coordinated.

Looking forward, there are use case beyond thermal/power capping. I think
we can consolidate ballanced partial busy workload that are evenly
distributed among CPUs.

Please let me know what you think.

Thanks,

Jacob Pan (3):
ktime: add a roundup function
timer: relax tick stop in idle entry
sched: introduce synchronized idle injection

--
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

[RFC PATCH 0/3] CFS idle injection

2015-11-02 Thread Jacob Pan

Hi Peter and all,

A while ago, we had discussion about how powerclamp is broken in the
sense of turning off idle ticks in the forced idle period.
https://lkml.org/lkml/2014/12/18/369

The patches below are for #2. There is a known problem with LOW RES timer
mode that I am working on. But I am hoping to get review earlier.

Looking forward, there are use case beyond thermal/power capping. I think
we can consolidate ballanced partial busy workload that are evenly
distributed among CPUs.

Please let me know what you think.

Thanks,

Jacob Pan (3):
ktime: add a roundup function
timer: relax tick stop in idle entry
sched: introduce synchronized idle injection

--
1.9.1

40 matches

Mail list logo