Around 2010, someone who used a code snipped that I published in
a wiki, reported that the code didn't work and hang in an
endless loop.  Soon I found out that it was due to some GCC
problem, and I got interested in fixing the compiler so that
it worked with my code.

1 1/2 years later, in 2011, I could provide a patch to fix the
problem, and it was committed as one of my first additions to GCC.
One and a half years -- that was the time to get a Copyright
Assignment from the FSF, the time to get the paperwork done, just
privately for me, no company's legal issues hampering the process.

Attracted by GCC's Free Software philosophy, and also as someone
who used the compiler for small µ-Controller hobby projects, I had
the crazy idea of improving the compiler, fixing bugs, teaching
it better code generation like "hey, there is an instruction too
much", or "this can be done using less registers", turning it into
a better piece of software.  Improving it, until the list of open
problems for the avr target would fit onto one screen -- and the
screen I used was not a big one.

From time to time I am providing snapshot builds for Windows,
because most users use that OS (or because it is way harder to
build GCC for that host), and to get feedback about new additions.
One such feedback, from just 2 weeks ago, was "I would really
prefer some 4.x version that just fixes bugs, because each new
version increases code size by about 2%".

This weekend I un-mothed an old project, just an innocent game on a
cathode-ray-tube, driven by some AVR µC.  After preparing the software
so that it compiled with good old v3.4, the results overwhelmed me
with complete frustration:  Any version from v4.7 up to v8 which I
fed into the compiler (six major version to be precise), produced a
code > 25% larger than compiled with v3.4.6 and the code is not only
bigger, it's all needless bloat that will also slow down the result;
optimizing for speed might bloat and slow even more.

All the -fno-tree-loop-optimize (because -ftree-loop-optimize
decreases performance), the -fno-reorder-blocks (because BB
reordering decreases performance), the -fno-move-loop-invariants
(you guess why) all that asm("":"+r"(var)) hacks to push modern
GCC into the right direction -- it's light years away from the
compiler GCC used to be.

When reading the code generated by v3.4.6, it takes some time until
you can be sure that it's generated by a compiler and not by a human
assembler programmer.  Compared to that, the results from a
"modern" GCC gives the impression of an old, toothless tiger that
chews again and again on the same code, chewing more than 300 times,
trying to crunch a simple, mostly linear code, fruitless and futile
attempts to come up with something smart, transforming and analysing
the stuff again and again, one pass not knowing what the other passes
will do, trampling their results, duplicating the code, using broken
cost models, using implicit handling that may help code generation for
bolides, deeply interwoven into the compiler's algorithms and working,
until that old, poor tiger spits it out again, as half-stomached pulp.

It just left me stunned, overwhelmed with frustration, feeling
ashamed of all that efforts and time and dedication that I put into
that sink.  All that features that GCC learned since then, new combiner
patterns for smarter insns, peepholes for loops, better costs, working
on libgcc asm implementations to quench out the last tick, all these
SSA passes, all that LTO magic, all the C++ transition -- all this
produces a code like from a drunken, typing monkey.

What an antique compiler accomplished in the blink of the eye,
will take 10 times or more host resources, just to come up with
bloat in almost any place.

I didn't even try to find out what is broken. For now it's more than I
can bear. I am just human, so I left it for a wizard as PR81625, for
someone that can repair something that's beyond repair -- beyond repair
not because it cannot be repaired in principle, not because it's
prohibited by some laws of nature, but as the natural consequence
of a multi-hundred pass, re-targetable compiler design.

Already daring to add a simple hook that won't hurt any other target,
that can easily be explained to someone who's not into gcc internals,
will be rejected as too specific, something that doesn't add a single
instruction to any other target, something that doesn't take longer to
execute on the host than a beam of light would need to pass alongside
my body from head to toe.

A bag of more than 20 backend fleas, impossible to contain.
Maybe 4 or 5 can be managed at the same time, being trained to well
behaviour because they are similar enough to fit the same model,
but you'll never get hold of all of them fleas, some targets will
just produce garbage, be happy that they do anything at all and
don't ICE or shred your cathode ray tube - electrons won't wait.

Let me thank for the insights into this piece of real-world software,
for all the contributions I was allowed to add.  Maybe I'll return some
day and dare again, but for now my ambitions and illusions may die in
peace.

Johann

Reply via email to