Re: Quantitative analysis of -Os vs -O3
Allan Sandfeld Jensenwrites: > > Yeah. That is just more problematic in practice. Though I do believe we have > support for it. It is good to know it will automatically upgrade > optimizations > like that. I just wish there was a way to distribute pre-generated arch- > independent training data. autofdo supports that in principle (but probably would need some improvements in the tools to make it really easily usable, especially with shared libraries) -Andi
Re: Quantitative analysis of -Os vs -O3
FYI - I’ve updated the stats to include -O2 in addition to -O3 and -Os: - https://rv8.io/bench#optimisation There are 57 plots and 31 tables. It’s quite a bit of data. It will be quite interesting to run these on new gcc releases to monitor changes. The Geomean for -O2 is 0.98 of -O3 on x86-64. I probably need to add some tables that show file sizes per architecture side by side, versus the current grouping which is by optimisation level to allow comparisons between architectures. If I pivot the data, we can add file size ratios by optimisation level per architecture. Note: these are relatively small benchmark programs, however the stats are still interesting. I’m most interested in RISC-V register allocation at present. -O2 does pretty well on file size compared to -O3, on all architectures. At a glance, the -O2 file sizes are slightly larger than the -Os file sizes but the performance increase is considerably more. I could perhaps show ratios of performance vs size between -O2 and -Os. > On 26 Aug 2017, at 10:05 PM, Michael Clarkwrote: > >> >> On 26 Aug 2017, at 8:39 PM, Andrew Pinski wrote: >> >> On Sat, Aug 26, 2017 at 1:23 AM, Michael Clark wrote: >>> Dear GCC folk, >>> I have to say that’s GCC’s -Os caught me by surprise after several years >>> using Apple GCC and more recently LLVM/Clang in Xcode. Over the last year >>> and a half I have been working on RISC-V development and have been >>> exclusively using GCC for RISC-V builds, and initially I was using -Os. >>> After performing a qualitative/quantitative assessment I don’t believe >>> GCC’s current -Os is particularly useful, at least for my needs as it >>> doesn’t provide a commensurate saving in size given the sometimes quite >>> huge drop in performance. >>> >>> I’m quoting an extract from Eric’s earlier email on the Overwhelmed by GCC >>> frustration thread, as I think Apple’s documentation which presumably >>> documents Clang/LLVM -Os policy is what I would call an ideal -Os (perhaps >>> using -O2 as a starting point) with the idea that the current -Os is >>> renamed to -Oz. >>> >>> -Oz >>> (APPLE ONLY) Optimize for size, regardless of performance. -Oz >>> enables the same optimization flags that -Os uses, but -Oz also >>> enables other optimizations intended solely to reduce code >>> size. >>> In particular, instructions that encode into fewer bytes are >>> preferred over longer instructions that execute in fewer >>> cycles. >>> -Oz on Darwin is very similar to -Os in FSF distributions of >>> GCC. >>> -Oz employs the same inlining limits and avoids string >>> instructions >>> just like -Os. >>> >>> -Os >>> Optimize for size, but not at the expense of speed. -Os >>> enables all >>> -O2 optimizations that do not typically increase code size. >>> However, instructions are chosen for best performance, >>> regardless >>> of size. To optimize solely for size on Darwin, use -Oz (APPLE >>> ONLY). >>> >>> I have recently been working on a benchmark suite to test a RISC-V JIT >>> engine. I have performed all testing using GCC 7.1 as the baseline >>> compiler, and during the process I have collected several performance >>> metrics, some that are neutral to the JIT runtime environment. In >>> particular I have made performance comparisons between -Os and -O3 on x86, >>> along with capturing executable file sizes, dynamic retired instruction and >>> micro-op counts for x86, dynamic retired instruction counts for RISC-V as >>> well as dynamic register and instruction usage histograms for RISC-V, for >>> both -Os and -O3. >>> >>> See the Optimisation section for a charted performance comparison between >>> -O3 and -Os. There are dozens of other plots that show the differences >>> between -Os and -O3. >>> >>> - https://rv8.io/bench >>> >>> The Geomean on x86 shows a 19% performance hit for -Os vs -O3 on x86. The >>> Geomean of course smooths over some pathological cases where -Os >>> performance is severely degraded versus -O3 but not with significant, or >>> commensurate savings in size. >> >> >> First let me put into some perspective on -Os usage and some history: >> 1) -Os is not useful for non-embedded users >> 2) the embedded folks really need the smallest code possible and >> usually will be willing to afford the performance hit >> 3) -Os was a mistake for Apple to use in the first place; they used it >> and then GCC got better for PowerPC to use the string instructions >> which is why -Oz was added :) >> 4) -Os is used heavily by the arm/thumb2 folks in bare metal applications. >> >> Comparing -O3 to -Os is not totally fair on x86 due to the many >> different instructions and encodings. >> Compare it on ARM/Thumb2 or MIPS/MIPS16 (or
Re: Quantitative analysis of -Os vs -O3
On Samstag, 26. August 2017 12:59:06 CEST Markus Trippelsdorf wrote: > On 2017.08.26 at 12:40 +0200, Allan Sandfeld Jensen wrote: > > On Samstag, 26. August 2017 10:56:16 CEST Markus Trippelsdorf wrote: > > > On 2017.08.26 at 01:39 -0700, Andrew Pinski wrote: > > > > First let me put into some perspective on -Os usage and some history: > > > > 1) -Os is not useful for non-embedded users > > > > 2) the embedded folks really need the smallest code possible and > > > > usually will be willing to afford the performance hit > > > > 3) -Os was a mistake for Apple to use in the first place; they used it > > > > and then GCC got better for PowerPC to use the string instructions > > > > which is why -Oz was added :) > > > > 4) -Os is used heavily by the arm/thumb2 folks in bare metal > > > > applications. > > > > > > > > Comparing -O3 to -Os is not totally fair on x86 due to the many > > > > different instructions and encodings. > > > > Compare it on ARM/Thumb2 or MIPS/MIPS16 (or micromips) where size is a > > > > big issue. > > > > I soon have a need to keep overall (bare-metal) application size down > > > > to just 256k. > > > > Micro-controllers are places where -Os matters the most. > > > > > > > > This comment does not help my application usage. It rather hurts it > > > > and goes against what -Os is really about. It is not about reducing > > > > icache pressure but overall application code size. I really need the > > > > code to fit into a specific size. > > > > > > For many applications using -flto does reduce code size more than just > > > going from -O2 to -Os. > > > > I added the option to optimize with -Os in Qt, and it gives an average 15% > > reduction in binary size, somtimes as high as 25%. Using lto gives almost > > the same (slightly less), but the two options combine perfectly and using > > both can reduce binary size from 20 to 40%. And that is on a shared > > library, not even a statically linked binary. > > > > Only real minus is that some of the libraries especially QtGui would > > benefit from a auto-vectorization, so it would be nice if there existed > > an -O3s version which vectorized the most obvious vectorizable functions, > > a few hundred bytes for an additional version here and there would do > > good. Fortunately it doesn't too much damage as we have manually > > vectorized routines for to have good performance also on MSVC, if we > > relied more on auto- vectorization it would be worse. > > In that case using profile guided optimizations will help. It will > optimize cold functions with -Os and hot functions with -O3 (when using > e.g.: "-flto -O3 -fprofile-use"). Of course you will have to compile > twice and also collect training data from your library in between. Yeah. That is just more problematic in practice. Though I do believe we have support for it. It is good to know it will automatically upgrade optimizations like that. I just wish there was a way to distribute pre-generated arch- independent training data. `Allan
Re: Quantitative analysis of -Os vs -O3
On 2017.08.26 at 12:40 +0200, Allan Sandfeld Jensen wrote: > On Samstag, 26. August 2017 10:56:16 CEST Markus Trippelsdorf wrote: > > On 2017.08.26 at 01:39 -0700, Andrew Pinski wrote: > > > First let me put into some perspective on -Os usage and some history: > > > 1) -Os is not useful for non-embedded users > > > 2) the embedded folks really need the smallest code possible and > > > usually will be willing to afford the performance hit > > > 3) -Os was a mistake for Apple to use in the first place; they used it > > > and then GCC got better for PowerPC to use the string instructions > > > which is why -Oz was added :) > > > 4) -Os is used heavily by the arm/thumb2 folks in bare metal applications. > > > > > > Comparing -O3 to -Os is not totally fair on x86 due to the many > > > different instructions and encodings. > > > Compare it on ARM/Thumb2 or MIPS/MIPS16 (or micromips) where size is a > > > big issue. > > > I soon have a need to keep overall (bare-metal) application size down > > > to just 256k. > > > Micro-controllers are places where -Os matters the most. > > > > > > This comment does not help my application usage. It rather hurts it > > > and goes against what -Os is really about. It is not about reducing > > > icache pressure but overall application code size. I really need the > > > code to fit into a specific size. > > > > For many applications using -flto does reduce code size more than just > > going from -O2 to -Os. > > I added the option to optimize with -Os in Qt, and it gives an average 15% > reduction in binary size, somtimes as high as 25%. Using lto gives almost the > same (slightly less), but the two options combine perfectly and using both > can > reduce binary size from 20 to 40%. And that is on a shared library, not even > a > statically linked binary. > > Only real minus is that some of the libraries especially QtGui would benefit > from a auto-vectorization, so it would be nice if there existed an -O3s > version which vectorized the most obvious vectorizable functions, a few > hundred bytes for an additional version here and there would do good. > Fortunately it doesn't too much damage as we have manually vectorized > routines > for to have good performance also on MSVC, if we relied more on auto- > vectorization it would be worse. In that case using profile guided optimizations will help. It will optimize cold functions with -Os and hot functions with -O3 (when using e.g.: "-flto -O3 -fprofile-use"). Of course you will have to compile twice and also collect training data from your library in between. -- Markus
Re: Quantitative analysis of -Os vs -O3
On Samstag, 26. August 2017 10:56:16 CEST Markus Trippelsdorf wrote: > On 2017.08.26 at 01:39 -0700, Andrew Pinski wrote: > > First let me put into some perspective on -Os usage and some history: > > 1) -Os is not useful for non-embedded users > > 2) the embedded folks really need the smallest code possible and > > usually will be willing to afford the performance hit > > 3) -Os was a mistake for Apple to use in the first place; they used it > > and then GCC got better for PowerPC to use the string instructions > > which is why -Oz was added :) > > 4) -Os is used heavily by the arm/thumb2 folks in bare metal applications. > > > > Comparing -O3 to -Os is not totally fair on x86 due to the many > > different instructions and encodings. > > Compare it on ARM/Thumb2 or MIPS/MIPS16 (or micromips) where size is a > > big issue. > > I soon have a need to keep overall (bare-metal) application size down > > to just 256k. > > Micro-controllers are places where -Os matters the most. > > > > This comment does not help my application usage. It rather hurts it > > and goes against what -Os is really about. It is not about reducing > > icache pressure but overall application code size. I really need the > > code to fit into a specific size. > > For many applications using -flto does reduce code size more than just > going from -O2 to -Os. I added the option to optimize with -Os in Qt, and it gives an average 15% reduction in binary size, somtimes as high as 25%. Using lto gives almost the same (slightly less), but the two options combine perfectly and using both can reduce binary size from 20 to 40%. And that is on a shared library, not even a statically linked binary. Only real minus is that some of the libraries especially QtGui would benefit from a auto-vectorization, so it would be nice if there existed an -O3s version which vectorized the most obvious vectorizable functions, a few hundred bytes for an additional version here and there would do good. Fortunately it doesn't too much damage as we have manually vectorized routines for to have good performance also on MSVC, if we relied more on auto- vectorization it would be worse. `Allan
Re: Quantitative analysis of -Os vs -O3
> On 26 Aug 2017, at 8:39 PM, Andrew Pinskiwrote: > > On Sat, Aug 26, 2017 at 1:23 AM, Michael Clark wrote: >> Dear GCC folk, >> I have to say that’s GCC’s -Os caught me by surprise after several years >> using Apple GCC and more recently LLVM/Clang in Xcode. Over the last year >> and a half I have been working on RISC-V development and have been >> exclusively using GCC for RISC-V builds, and initially I was using -Os. >> After performing a qualitative/quantitative assessment I don’t believe GCC’s >> current -Os is particularly useful, at least for my needs as it doesn’t >> provide a commensurate saving in size given the sometimes quite huge drop in >> performance. >> >> I’m quoting an extract from Eric’s earlier email on the Overwhelmed by GCC >> frustration thread, as I think Apple’s documentation which presumably >> documents Clang/LLVM -Os policy is what I would call an ideal -Os (perhaps >> using -O2 as a starting point) with the idea that the current -Os is renamed >> to -Oz. >> >>-Oz >> (APPLE ONLY) Optimize for size, regardless of performance. -Oz >> enables the same optimization flags that -Os uses, but -Oz also >> enables other optimizations intended solely to reduce code >> size. >> In particular, instructions that encode into fewer bytes are >> preferred over longer instructions that execute in fewer >> cycles. >> -Oz on Darwin is very similar to -Os in FSF distributions of >> GCC. >> -Oz employs the same inlining limits and avoids string >> instructions >> just like -Os. >> >>-Os >> Optimize for size, but not at the expense of speed. -Os >> enables all >> -O2 optimizations that do not typically increase code size. >> However, instructions are chosen for best performance, >> regardless >> of size. To optimize solely for size on Darwin, use -Oz (APPLE >> ONLY). >> >> I have recently been working on a benchmark suite to test a RISC-V JIT >> engine. I have performed all testing using GCC 7.1 as the baseline compiler, >> and during the process I have collected several performance metrics, some >> that are neutral to the JIT runtime environment. In particular I have made >> performance comparisons between -Os and -O3 on x86, along with capturing >> executable file sizes, dynamic retired instruction and micro-op counts for >> x86, dynamic retired instruction counts for RISC-V as well as dynamic >> register and instruction usage histograms for RISC-V, for both -Os and -O3. >> >> See the Optimisation section for a charted performance comparison between >> -O3 and -Os. There are dozens of other plots that show the differences >> between -Os and -O3. >> >>- https://rv8.io/bench >> >> The Geomean on x86 shows a 19% performance hit for -Os vs -O3 on x86. The >> Geomean of course smooths over some pathological cases where -Os performance >> is severely degraded versus -O3 but not with significant, or commensurate >> savings in size. > > > First let me put into some perspective on -Os usage and some history: > 1) -Os is not useful for non-embedded users > 2) the embedded folks really need the smallest code possible and > usually will be willing to afford the performance hit > 3) -Os was a mistake for Apple to use in the first place; they used it > and then GCC got better for PowerPC to use the string instructions > which is why -Oz was added :) > 4) -Os is used heavily by the arm/thumb2 folks in bare metal applications. > > Comparing -O3 to -Os is not totally fair on x86 due to the many > different instructions and encodings. > Compare it on ARM/Thumb2 or MIPS/MIPS16 (or micromips) where size is a > big issue. > I soon have a need to keep overall (bare-metal) application size down > to just 256k. > Micro-controllers are places where -Os matters the most. Fair points. - Size at all cost is useful for the embedded case where there is a restricted footprint. - It’s fair to compare on RISC-V which has the RVC compressed ISA extension, which is conceptually similar to Thumb-2 - Understand renaming -Os to -Oz would cause a few downstream issues for those who expect size at all costs. - There is an achievable use-case for good RVC compression and good performance on RISC-V However the question remains, what options does one choose for size, but not size at the expense of speed. -O2 and an -mtune? I’m probably interested in an -O2 with an -mtune that can favour register allocations that result in better RVC compression for RISC-V. Ideally the dominant register set can be assigned to x8 through x15 using loop frequency information and this would result in better compression and also reduce dynamic icache pressure. I think I should look more closely at LRA and see how it uses register_priority. There is a use
RE: Quantitative analysis of -Os vs -O3
> 4) -Os is used heavily by the arm/thumb2 folks in bare metal applications. Also by the x86 in bare-mental firmware, e.g. http://www.uefi.org/ > For many applications using -flto does reduce code size more than just > going from -O2 to -Os. Yes. -flto is must to have, but the -Os is still necessary. E.g. Uefi firmware use both (-flto -Os) when GCC build. Only -flto + -Os can make Uefi frimware GCC build be competivie with MSVS in terms of code size. Steven Thanks
Re: Quantitative analysis of -Os vs -O3
On 2017.08.26 at 01:39 -0700, Andrew Pinski wrote: > > First let me put into some perspective on -Os usage and some history: > 1) -Os is not useful for non-embedded users > 2) the embedded folks really need the smallest code possible and > usually will be willing to afford the performance hit > 3) -Os was a mistake for Apple to use in the first place; they used it > and then GCC got better for PowerPC to use the string instructions > which is why -Oz was added :) > 4) -Os is used heavily by the arm/thumb2 folks in bare metal applications. > > Comparing -O3 to -Os is not totally fair on x86 due to the many > different instructions and encodings. > Compare it on ARM/Thumb2 or MIPS/MIPS16 (or micromips) where size is a > big issue. > I soon have a need to keep overall (bare-metal) application size down > to just 256k. > Micro-controllers are places where -Os matters the most. > > This comment does not help my application usage. It rather hurts it > and goes against what -Os is really about. It is not about reducing > icache pressure but overall application code size. I really need the > code to fit into a specific size. For many applications using -flto does reduce code size more than just going from -O2 to -Os. -- Markus
Re: Quantitative analysis of -Os vs -O3
On Sat, Aug 26, 2017 at 1:23 AM, Michael Clarkwrote: > Dear GCC folk, > I have to say that’s GCC’s -Os caught me by surprise after several years > using Apple GCC and more recently LLVM/Clang in Xcode. Over the last year and > a half I have been working on RISC-V development and have been exclusively > using GCC for RISC-V builds, and initially I was using -Os. After performing > a qualitative/quantitative assessment I don’t believe GCC’s current -Os is > particularly useful, at least for my needs as it doesn’t provide a > commensurate saving in size given the sometimes quite huge drop in > performance. > > I’m quoting an extract from Eric’s earlier email on the Overwhelmed by GCC > frustration thread, as I think Apple’s documentation which presumably > documents Clang/LLVM -Os policy is what I would call an ideal -Os (perhaps > using -O2 as a starting point) with the idea that the current -Os is renamed > to -Oz. > > -Oz >(APPLE ONLY) Optimize for size, regardless of performance. -Oz >enables the same optimization flags that -Os uses, but -Oz also >enables other optimizations intended solely to reduce code > size. >In particular, instructions that encode into fewer bytes are >preferred over longer instructions that execute in fewer > cycles. >-Oz on Darwin is very similar to -Os in FSF distributions of > GCC. >-Oz employs the same inlining limits and avoids string > instructions >just like -Os. > > -Os >Optimize for size, but not at the expense of speed. -Os > enables all >-O2 optimizations that do not typically increase code size. >However, instructions are chosen for best performance, > regardless >of size. To optimize solely for size on Darwin, use -Oz (APPLE >ONLY). > > I have recently been working on a benchmark suite to test a RISC-V JIT > engine. I have performed all testing using GCC 7.1 as the baseline compiler, > and during the process I have collected several performance metrics, some > that are neutral to the JIT runtime environment. In particular I have made > performance comparisons between -Os and -O3 on x86, along with capturing > executable file sizes, dynamic retired instruction and micro-op counts for > x86, dynamic retired instruction counts for RISC-V as well as dynamic > register and instruction usage histograms for RISC-V, for both -Os and -O3. > > See the Optimisation section for a charted performance comparison between -O3 > and -Os. There are dozens of other plots that show the differences between > -Os and -O3. > > - https://rv8.io/bench > > The Geomean on x86 shows a 19% performance hit for -Os vs -O3 on x86. The > Geomean of course smooths over some pathological cases where -Os performance > is severely degraded versus -O3 but not with significant, or commensurate > savings in size. First let me put into some perspective on -Os usage and some history: 1) -Os is not useful for non-embedded users 2) the embedded folks really need the smallest code possible and usually will be willing to afford the performance hit 3) -Os was a mistake for Apple to use in the first place; they used it and then GCC got better for PowerPC to use the string instructions which is why -Oz was added :) 4) -Os is used heavily by the arm/thumb2 folks in bare metal applications. Comparing -O3 to -Os is not totally fair on x86 due to the many different instructions and encodings. Compare it on ARM/Thumb2 or MIPS/MIPS16 (or micromips) where size is a big issue. I soon have a need to keep overall (bare-metal) application size down to just 256k. Micro-controllers are places where -Os matters the most. > > I don’t currently have -O2 in my results however it seems like I should add > -O2 to the benchmark suite. If you take a look at the web page you’ll see > that there is already a huge amount of data given we have captured dynamic > register frequencies and dynamic instruction frequencies for -Os and -O3. The > tables and charts are all generated by scripts so if there is interest I > could add -O2. I can also pretty easily perform runs with new compiler > versions as everything is completely automated. The biggest factor is that it > currently takes 4 hours for a full run as we run all of the benchmarks in a > simulator to capture dynamic register usage and dynamic instruction usage. > > After looking at the results, one has to question the utility of -Os in its > present form, and indeed question how it is actually used in practice, given > the proportion of savings in executable size. After my assessment I would not > recommend anyone to use -Os because its savings in size are not proportionate > to the loss in performance. I feel discouraged from using it after looking at > the results. I really don’t
Quantitative analysis of -Os vs -O3
Dear GCC folk, I have to say that’s GCC’s -Os caught me by surprise after several years using Apple GCC and more recently LLVM/Clang in Xcode. Over the last year and a half I have been working on RISC-V development and have been exclusively using GCC for RISC-V builds, and initially I was using -Os. After performing a qualitative/quantitative assessment I don’t believe GCC’s current -Os is particularly useful, at least for my needs as it doesn’t provide a commensurate saving in size given the sometimes quite huge drop in performance. I’m quoting an extract from Eric’s earlier email on the Overwhelmed by GCC frustration thread, as I think Apple’s documentation which presumably documents Clang/LLVM -Os policy is what I would call an ideal -Os (perhaps using -O2 as a starting point) with the idea that the current -Os is renamed to -Oz. -Oz (APPLE ONLY) Optimize for size, regardless of performance. -Oz enables the same optimization flags that -Os uses, but -Oz also enables other optimizations intended solely to reduce code size. In particular, instructions that encode into fewer bytes are preferred over longer instructions that execute in fewer cycles. -Oz on Darwin is very similar to -Os in FSF distributions of GCC. -Oz employs the same inlining limits and avoids string instructions just like -Os. -Os Optimize for size, but not at the expense of speed. -Os enables all -O2 optimizations that do not typically increase code size. However, instructions are chosen for best performance, regardless of size. To optimize solely for size on Darwin, use -Oz (APPLE ONLY). I have recently been working on a benchmark suite to test a RISC-V JIT engine. I have performed all testing using GCC 7.1 as the baseline compiler, and during the process I have collected several performance metrics, some that are neutral to the JIT runtime environment. In particular I have made performance comparisons between -Os and -O3 on x86, along with capturing executable file sizes, dynamic retired instruction and micro-op counts for x86, dynamic retired instruction counts for RISC-V as well as dynamic register and instruction usage histograms for RISC-V, for both -Os and -O3. See the Optimisation section for a charted performance comparison between -O3 and -Os. There are dozens of other plots that show the differences between -Os and -O3. - https://rv8.io/bench The Geomean on x86 shows a 19% performance hit for -Os vs -O3 on x86. The Geomean of course smooths over some pathological cases where -Os performance is severely degraded versus -O3 but not with significant, or commensurate savings in size. I don’t currently have -O2 in my results however it seems like I should add -O2 to the benchmark suite. If you take a look at the web page you’ll see that there is already a huge amount of data given we have captured dynamic register frequencies and dynamic instruction frequencies for -Os and -O3. The tables and charts are all generated by scripts so if there is interest I could add -O2. I can also pretty easily perform runs with new compiler versions as everything is completely automated. The biggest factor is that it currently takes 4 hours for a full run as we run all of the benchmarks in a simulator to capture dynamic register usage and dynamic instruction usage. After looking at the results, one has to question the utility of -Os in its present form, and indeed question how it is actually used in practice, given the proportion of savings in executable size. After my assessment I would not recommend anyone to use -Os because its savings in size are not proportionate to the loss in performance. I feel discouraged from using it after looking at the results. I really don’t believe -Os makes the right trades e.g. reducing icache pressure can indeed lead to better performance due to reduced code size. I also wonder whether -O2 level optimisations may be a good starting point for a more useful -Os and how one would proceed towards selecting optimisations to add back to -Os to increase its usability, or rename the current -Os to -Oz and make -Os an alias for -O2. A similar profile to -O2 would probably produce less shock for anyone who does quantitative performance analysis of -Os. In fact there are some interesting issues for the RISC-V backend given the assembler performs RVC compression and GCC doesn’t really see the size of emitted instructions. It would be an interesting backend to investigate improving -Os presuming that a backend can opt in to various optimisations for a given optimisation level. RISC-V would gain most of its size and runtime icache pressure reduction improvements by getting the highest frequency registers allocated within the 8 register set that is