Hi all! On 12/05/2014 11:30 μμ, Stratos Karafotis wrote: > On 09/05/2014 05:56 μμ, Stratos Karafotis wrote: >> Hi Dirk, >> >> On 08/05/2014 11:52 μμ, Dirk Brandewie wrote: >>> On 05/05/2014 04:57 PM, Stratos Karafotis wrote: >>>> Currently the driver calculates the next pstate proportional to >>>> core_busy factor, scaled by the ratio max_pstate / current_pstate. >>>> >>>> Using the scaled load (core_busy) to calculate the next pstate >>>> is not always correct, because there are cases that the load is >>>> independent from current pstate. For example, a tight 'for' loop >>>> through many sampling intervals will cause a load of 100% in >>>> every pstate. >>>> >>>> So, change the above method and calculate the next pstate with >>>> the assumption that the next pstate should not depend on the >>>> current pstate. The next pstate should only be proportional >>>> to measured load. Use the linear function to calculate the load: >>>> >>>> Next P-state = A + B * load >>>> >>>> where A = min_state and B = (max_pstate - min_pstate) / 100 >>>> If turbo is enabled the B = (turbo_pstate - min_pstate) / 100 >>>> The load is calculated using the kernel time functions. >>>> >> >> Thank you very much for your comments and for your time to test my patch! >> >> >>> >>> This will hurt your power numbers under "normal" conditions where you >>> are not running a performance workload. Consider the following: >>> >>> 1. The system is idle, all core at min P state and utilization is low >>> say < 10% >>> 2. You run something that drives the load as seen by the kernel to 100% >>> which scaled by the current P state. >>> >>> This would cause the P state to go from min -> max in one step. Which is >>> what you want if you are only looking at a single core. But this will also >>> drag every core in the package to the max P state as well. This would be >>> fine >> >> I think, this will also happen using the original driver (before your >> new patch 4/5), after some sampling intervals. >> >> >>> if the power vs frequency cure was linear all the cores would finish >>> their work faster and go idle sooner (race to halt) and maybe spend >>> more time in a deeper C state which dwarfs the amount of power we can >>> save by controlling P states. Unfortunately this is *not* the case, >>> power vs frequency curve is non-linear and get very steep in the turbo >>> range. If it were linear there would be no reason to have P state >>> control you could select the highest P state and walk away. >>> >>> Being conservative on the way up and aggressive on way down give you >>> the best power efficiency on non-benchmark loads. Most benchmarks >>> are pretty useless for measuring power efficiency (unless they were >>> designed for it) since they are measuring how fast something can be >>> done which is measuring the efficiency at max performance. >>> >>> The performance issues you pointed out were caused by commit >>> fcb6a15c intel_pstate: Take core C0 time into account for core busy >>> calculation >>> and the ensuing problem is caused. These have been fixed in the patch set >>> >>> https://lkml.org/lkml/2014/5/8/574 >>> >>> The performance comparison between before/after this patch set, your patch >>> and ondemand/acpi_cpufreq is available at: >>> http://openbenchmarking.org/result/1405085-PL-C0200965993 >>> ffmpeg was added to the set of benchmarks because there was a regression >>> reported against this benchmark as well. >>> https://bugzilla.kernel.org/show_bug.cgi?id=75121 >> >> Of course, I agree generally with your comments above. But I believe that >> the we should scale the core as soon as we measure high load. >> >> I tested your new patches and I confirm your benchmarks. But I think >> they are against the above theory (at least on low loads). >> With the new patches I get increased frequencies even on an idle system. >> Please compare the results below. >> >> With your latest patches during a mp3 decoding (a non-benchmark load) >> the energy consumption increased to 5187.52 J from 5036.57 J (almost 3%). >> >> >> Thanks again, >> Stratos >> > > I would like to explain a little bit further the logic behind this patch. > > The patch is based on the following assumptions (some of them are pretty > obvious but please let me mention them): > > 1) We define the load of the CPU as the percentage of sampling period that > CPU was busy (not idle), as measured by the kernel. > > 2) It's not possible to predict (with accuracy) the load of a CPU in future > sampling periods. > > 3) The load in the next sampling interval is most probable to be very > close to the current sampling interval. (Actually the load in the > next sampling interval could have any value, 0 - 100). > > 4) In order to select the next performance state of the CPU we need to > calculate the load frequently (as fast as hardware permits) and change > the next state accordingly. > > 5) At a given constant 0% (zero) load in a specific period, the CPU > performance state should be equal to minimum available state. > > 6) At a given constant 100% load in a specific period, the CPU performance > state should be equal to maximum available state. > > 7) Ideally, the CPU should execute instructions at maximum performance state. > > > According to the above if the measured load in a sampling interval is, for > example 50%, ideally the CPU should spent half of the next sampling period > to maximum pstate and half of the period to minimum pstate. Of course > it's impossible to increase the sampling frequency so much. > > Thus, we consider that the best approximation would be: > > Next performance state = min_perf + (max_perf - min_perf) * load / 100 >
Any additional comments? Should I consider it a rejected approach? Thanks, Stratos -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/