On Tuesday, June 14, 2005, at 10:08 PM, Richard Henderson wrote:
Didn't RTH objected the last time?
One has to do a less gross job of it than Red Hat did.
I did go back and re-reread all the useful content you, and others
gave. I did expect that all past concerns raised remain and that we'd
be expected to find some `reasonable' way to address all those, and all
the ones that would come up during the development of the feature. You
and Mark I think raised the highest bars, but I remain hopeful that
there is a solution that is low enough impact in other areas of the
compiler, and yet adds the required functionality. I think the CW asms
stuff was a good learning experience for us.
I suppose I could be prodded into pulling out the code
I don't want to so prode you right now. If/when we start up a project
to do it, I'd be interested in things like testsuite (testcases) and
the description for how the parser was wired in for C++, if you liked
the way it was done, as our CW version feels too duplicative and more
invasive than ultimate I think I'd like.
You'll need and EXTREMELY large testsuite. You'll find that the MS
documentation is useless, and you'll have to deduce the desired
semantics from customer code bases.
Yeah, been there, done that. Ultimately, I sure we'd miss some
important bits in the first couple of versions, but as people pumped
code through it, we'd get them in. I am not confident that I could get
enough help (enough testcases) fast enough to have it all done in the
first stage.
I don't recall if Darwin uses %ebx for pic code like ELF. If you
do, expect to find that lots of user code expects to be able to
clobber it, because Windows doesn't do pic code at all, and so
reserves no such register.
Ultimately, one just has to slowly and carefully describe it for the
optimizer, in the existing language the optimizer uses. The CW asms
were done using use ASM_EXPRs, and we got a lot of milage out of that
route. Fundamentally, if other constructs are a better match, it would
be better to use them. Take for example a __builtin_opcode_blabla and
blabla r1, r2, n1, instead of generating asm ("blabla .... ), we'd call
__builtin_opcode_blabla (r1, r2, n1), and then let it generate what
ever it does. If all instruction had a gen_blabla () or
__builtin_opcode_blabla(), trivially, one would avoid any ASM_EXPRs and
just use them all. The advantage to that style is then the optimizer
can know what is going on, and that ultimately, I think is required.
Also, I can see how to wire that style of support into the compiler
cleanly. The CW support didn't require invasive changes to the
optimizer. I'm hoping the worse case scenario for x86 support would be
the i387 fp stack and the carry flags and other status bits, and not
something like reload.
I suspect that one could get quite a lot of milage out of parsing
the assembly code and turning most of it into straight GIMPLE,
Yes, this is the direction I'd want to do.
Just so others have an idea of how invasive it might be, the CW asm are
strapped in with 67 multiline hunks, and 33 single line changes. I'd
hope/expect that MS asms would similar.