Re: [C--] C-- status report: yes, someone is (barely) out here

Brian Hurt Sun, 17 Dec 2006 10:19:19 -0800

Thanks for the status report.  Some comments that may or may not help...

First of all, let me say that I think C-- did hit the nail on the head. 
It was successfull in doing something that I think needed to be done.


Speaking as someone who has been wandering around looking at alternatives 
for implementing a functional language, the pros and cons of the 
various options are:

- GCC: Still quite complicated to work with, still requires you to write 
your compiler in C.  Implementing a decent type system is going to be 
interesting enough in Ocaml or Haskell, I'll pass on doing that in C. 
Which means a hybrid compiler, with a lot more complexity.  Also, 
functional languages are definately still second class citizens in GCC 
world- things like tail call optimization are still not where they need to 
be.  Which means implementing an optimization layer above GCC to deal with 
tail calls.  Plus you still have all the run time library issues you need 
to deal with- you still need to write a GC, exception handlers, threading, 
etc.  On the plus side, you do get a lot of fancy optimizations- SSE use, 
etc.

- Java/C#: The big advantage of targeting these run times is that with a 
little work in the language, you can take advantage of the huge libraries 
these runtimes have.  This the big advantage of an F# or Acute.  And you 
get at least middling decent optimization.  But only middling- Haskell's 
pure laziness costs about as much performance as Java's virtual machine 
architecture, as an approximation.

You don't have to write your own gc or exception handlers, but this is 
because you don't have a choice- you get to use what the runtime provides 
you.  And the ones the runtime provides you is decidedly suboptimal for 
functional languages, especially in the GC department.  Don Syme of F# 
asked the CLR people if they could maybe tune the GC to work better with 
programs with high rates of allocation (aka functional programs), and got 
told that high rates of allocation were a bug.  Plus, I think tail call 
optimization is even less advanced in these environments than GCC.  So 
again you're hitting the functional programming as a second class citizen 
problem.

- Write your own back end.  This is a boatload of work (as you guys well 
know)- approximately as much work as everything else in the compiler put 
together.  Also note that you still have all the run time issues to deal 
with.

- Write your own virtual machine.  And just eat the performance.  Note 
that again you still need to implement your own GC, threading, etc.  For 
performance, you're aiming at Ruby/PHP/TCL/Bourne shell level of 
performance- the level below Java/C#/Haskell.  Where functional 
programming really shines, I think, is programming in the large- word 
processors and CAD/CAM systems etc.  It's when you start dealing with 
things like maintainance and large scale reuse and multithreading that 
functional programming really spreads it's wings and flies.  And, unlike 
scripting/web programming, performance really does matter.  Even stepping 
down to Java-level performance becomes a problem (see OpenOffice).  For 
"proof of concept" languages this may not be a problem, but if you ever 
want your language to have even a remote possibility of being used by 
anyone other than yourself, this isn't acceptable.

- Use C as a back-end.  You're writing your own runtime again, tail 
recursion is poorly supported again, and a lot of function programming 
constructs don't map well to C.

- Use C--.  You still have to implement your runtime, but you're basically 
going to have to do that anyways.  You get decent optimization, you get to 
write your compiler in the language you want to, and functional languages 
are first class languages.

Of these options, I think C-- (assuming it's not a dead project) is the 
best of the lot.  Even if it needs some work (an x86-64 back end, the 
ability to move a stack frame from one stack to another), it'll be no more 
work than any other option.  My second choice would be GCC as a back end, 
I think.  But the point here is that the fundamental niche C-- fills is 
still usefull and needed.

Success in open source projects is mainly a matter of luck.  This is 
especially true for infrastructure projects, whose success depends upon 
some other successfull project using them.  I've been involved in 
uncounted email "discussions" (aka flamewars) along the lines of "what 
does this project need to do to be successfull", and I've seen all sorts 
of proposals and seen pretty much all of them shot down.  To my knowledge, 
there seems to be two requirements: 1) write code that works, and 2) get 
lucky.

Some forms of luck you can make for yourself, however.  One of the big 
strokes of luck that C-- lacks is a big, well known project that uses it. 
To give an example not quite at random, if a functional language luminary 
like Simon Peyton Jones were to start a project to design the successor to 
Haskell and Ocaml using C-- as the back end, that'd be a huge boost to 
C--.  Success of the Haskell++ language using C-- would bring visibility, 
credibility, and developers to C--.

I'm not sure if he'd be interested in such a project (even if he had 
volunteers, like you're truely, to do a lot of lifting).  I mention this 
because this is effectively why I'm looking at C-- (modulo SPJ's 
envolvement).

I am less convinced that converting C-- to C is all that important.  Maybe 
C-- to GCC's back end.  But the main benefit that would bring is support 
for more architectures.  If anything, the architecture realm has gotten 
simpler since 1998.  The Alpha, PA-RISC, and MIPS architectures are dead, 
Itanium is a no-show, and Apple has dropped the PowerPC for the x86.  By 
the time there is any demand for the Power/PowerPC or Sparc architectures 
to be supported, there will be more than enough developer interest to do 
so.  So the only architectures that really matter are x86-32 and x86-64.

The fancy optimizations (autovectorization, etc.) you'd get from enganging 
a C backend would be nice, but I don't think they're necessary.  The vast 
bulk of the performance boost native code gets is due to the lack if JIT 
compiling costs, register allocation, and simple peephole optimization. 
Well, and functional programming specific optimizations (uncurrying, 
etc.).  The "fancy optimizations" give you maybe 10-30% over basic 
optimizations.  You could easily hit Ocaml-level speeds with C--.

As for the generic run-time library, I think this is fraught with dangers. 
The biggest of which is the "you didn't do things exactly the way I wanted 
them done in my language, so it's worthless" syndrome.  The original 
design of C-- had it dead right, IMHO.  Things like exception handling and 
garbage collection are too heavily impacted by the language design and 
goals to have generic implementations.  For example, how do you know 
what's a pointer and what isn't?  An Ocaml-like language may want to 
decide to take the low order bit of every integer as a tag bit.  A more 
Java-like language may decide to use the reflection capabilities of the 
language to do GC.  Do you allow objects in older generations to hold 
pointers to objects in younger generations?  A functional language might 
say no, an imperitive language might say yes.  Are destructors common or 
uncommon?  Try to please everyone, and you'll likely end up with neither 
fish nor fowl, or a huge white elephant ("a mouse built to goverment 
specifications")- something that's equally useless to everyone.

Were I to implement such a library, I'd be inclined to do it primarily as 
the runtime to my language.  Which would at least guarentee one user, and 
gaurentee it fits the needs of one language.  But again, the success of 
the library would be primarily based on the success of the language built 
on top of it.

I think the three new things I'd like to see out of C-- are (in rough 
order of priority):
1) x86-64 support
2) the ability to move/copy a stack frame from one stack to another, and
3) Some form of inline assembler without having to go to C (necessary for 
writting threading primitives in C--)

I am contemplating just adding those capabilities.

Brian
_______________________________________________
Cminusminus mailing list
[email protected]
https://cminusminus.org/mailman/listinfo/cminusminus

Re: [C--] C-- status report: yes, someone is (barely) out here

Reply via email to