Around 2010, someone who used a code snipped that I published in a wiki, reported that the code didn't work and hang in an endless loop. Soon I found out that it was due to some GCC problem, and I got interested in fixing the compiler so that it worked with my code.
1 1/2 years later, in 2011, I could provide a patch to fix the problem, and it was committed as one of my first additions to GCC. One and a half years -- that was the time to get a Copyright Assignment from the FSF, the time to get the paperwork done, just privately for me, no company's legal issues hampering the process. Attracted by GCC's Free Software philosophy, and also as someone who used the compiler for small µ-Controller hobby projects, I had the crazy idea of improving the compiler, fixing bugs, teaching it better code generation like "hey, there is an instruction too much", or "this can be done using less registers", turning it into a better piece of software. Improving it, until the list of open problems for the avr target would fit onto one screen -- and the screen I used was not a big one. From time to time I am providing snapshot builds for Windows, because most users use that OS (or because it is way harder to build GCC for that host), and to get feedback about new additions. One such feedback, from just 2 weeks ago, was "I would really prefer some 4.x version that just fixes bugs, because each new version increases code size by about 2%". This weekend I un-mothed an old project, just an innocent game on a cathode-ray-tube, driven by some AVR µC. After preparing the software so that it compiled with good old v3.4, the results overwhelmed me with complete frustration: Any version from v4.7 up to v8 which I fed into the compiler (six major version to be precise), produced a code > 25% larger than compiled with v3.4.6 and the code is not only bigger, it's all needless bloat that will also slow down the result; optimizing for speed might bloat and slow even more. All the -fno-tree-loop-optimize (because -ftree-loop-optimize decreases performance), the -fno-reorder-blocks (because BB reordering decreases performance), the -fno-move-loop-invariants (you guess why) all that asm("":"+r"(var)) hacks to push modern GCC into the right direction -- it's light years away from the compiler GCC used to be. When reading the code generated by v3.4.6, it takes some time until you can be sure that it's generated by a compiler and not by a human assembler programmer. Compared to that, the results from a "modern" GCC gives the impression of an old, toothless tiger that chews again and again on the same code, chewing more than 300 times, trying to crunch a simple, mostly linear code, fruitless and futile attempts to come up with something smart, transforming and analysing the stuff again and again, one pass not knowing what the other passes will do, trampling their results, duplicating the code, using broken cost models, using implicit handling that may help code generation for bolides, deeply interwoven into the compiler's algorithms and working, until that old, poor tiger spits it out again, as half-stomached pulp. It just left me stunned, overwhelmed with frustration, feeling ashamed of all that efforts and time and dedication that I put into that sink. All that features that GCC learned since then, new combiner patterns for smarter insns, peepholes for loops, better costs, working on libgcc asm implementations to quench out the last tick, all these SSA passes, all that LTO magic, all the C++ transition -- all this produces a code like from a drunken, typing monkey. What an antique compiler accomplished in the blink of the eye, will take 10 times or more host resources, just to come up with bloat in almost any place. I didn't even try to find out what is broken. For now it's more than I can bear. I am just human, so I left it for a wizard as PR81625, for someone that can repair something that's beyond repair -- beyond repair not because it cannot be repaired in principle, not because it's prohibited by some laws of nature, but as the natural consequence of a multi-hundred pass, re-targetable compiler design. Already daring to add a simple hook that won't hurt any other target, that can easily be explained to someone who's not into gcc internals, will be rejected as too specific, something that doesn't add a single instruction to any other target, something that doesn't take longer to execute on the host than a beam of light would need to pass alongside my body from head to toe. A bag of more than 20 backend fleas, impossible to contain. Maybe 4 or 5 can be managed at the same time, being trained to well behaviour because they are similar enough to fit the same model, but you'll never get hold of all of them fleas, some targets will just produce garbage, be happy that they do anything at all and don't ICE or shred your cathode ray tube - electrons won't wait. Let me thank for the insights into this piece of real-world software, for all the contributions I was allowed to add. Maybe I'll return some day and dare again, but for now my ambitions and illusions may die in peace. Johann