Re: [Haskell-cafe] I want to write a compiler

2009-03-07 Thread Rick R
http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours

Not sure if this is exactly what you want, but you could certainly fufill
all of your requirements using this as a baseline. Instead of evaluating the
actual statements in your eval function, you could simply render them to C.

As far as FFI, if you are statically compiling to C, you can leave your
variables unboxed. This would allow you to call C functions directly.

Hope that helps.


On Sat, Mar 7, 2009 at 6:32 PM, Loup Vaillant wrote:

> This is a homework question. I am mainly looking for guidance or
> pointers. Source code, even. (Not GHC's, though, it is too complicated
> for me right now. The AST of STG is fine, but the rest kinda scares
> me.)
>
> Ideally, I would very much like to compile to C.
>
> The requirements are easily stated. My language must
> -> be lazy by default,
> -> support algebraic data types and case expressions (unless I can get
> away with encoding them as functions),
> -> have some kind of FFI with C (I suppose it imply some support for
> unboxed values).
>
> There is also the first class functions thing, but lambda lifting is okay.
>
> I don't care about any kind of execution efficiency. I just want my
> compiler to be As Simple As Possible. Eventually, I intend to
> bootstrap. Then, I will start dreaming about doing better than the
> Mighty Simons. (I can dream, right?)
>
> I figured I would start by the back-end. I began to write 15 pathetic
> lines of code to start an STG machine. Needles to say, I am stuck. I
> don't know where to start.
>
> I am already familiar with some papers. In particular, [1] (on
> template instantiation), [2] and [3]. I once wrote a simple graph
> reducer using template instantiation (about 20 lines at most).
>
> I have chosen the STG machine because I thought i would be easier to
> get an FFI and case exprs out of it. Maybe I am wrong, and template
> instantiation is really easier. There is also the BRISK machine, and
> GRIN. But the paper I read were less readable to me than the three
> mentioned.
>
> So, If any of you can give me some pointer or guidance or advice about
> where to start, I would be very grateful.
>
> Loup
>
>
> [1] http://www.cs.otago.ac.nz/cosc459/books/pjlester.pdf
> [2]
> http://research.microsoft.com/~simonpj/Papers/spineless-tagless-gmachine.ps.gz
> [3] 
> http://research.microsoft.com/~simonpj/Papers/eval-apply/index.htm
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>



-- 
We can't solve problems by using the same kind of thinking we used when we
created them.
   - A. Einstein
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] I want to write a compiler

2009-03-07 Thread Jeremy Shaw
Hello,

This book is pretty good IMO:

http://research.microsoft.com/en-us/um/people/simonpj/papers/slpj-book-1987/

It does not cover STG, but it does cover a ton of useful stuff and is
very well written.

I can say from experience that the spineless-tagless-gmachine paper
you referenced does contain all the information needed to implement an
STG->C compiler. But, you should expect to spend a lot of time
alternating between trying to implement the thing and rereading the
paper.

Also, you may find that it is easier to do STG->Assembly at first (via
harpy or something). With assembly it is easier to express exactly
what you want to happen. With C you have to try to figure out how to
trick the C compiler into doing your bidding.

Also, maybe it is more satisfying to output assembly, because then you
really feel like you are writing a compiler ;)

I am sure others will say that LLVM is a better option than C or
assembly. I know almost nothing about LLVM, so they may be right.

-- jeremy

At Sun, 8 Mar 2009 01:32:48 +0200,
Loup Vaillant wrote:
> 
> This is a homework question. I am mainly looking for guidance or
> pointers. Source code, even. (Not GHC's, though, it is too complicated
> for me right now. The AST of STG is fine, but the rest kinda scares
> me.)
> 
> Ideally, I would very much like to compile to C.
> 
> The requirements are easily stated. My language must
> -> be lazy by default,
> -> support algebraic data types and case expressions (unless I can get
> away with encoding them as functions),
> -> have some kind of FFI with C (I suppose it imply some support for
> unboxed values).
> 
> There is also the first class functions thing, but lambda lifting is okay.
> 
> I don't care about any kind of execution efficiency. I just want my
> compiler to be As Simple As Possible. Eventually, I intend to
> bootstrap. Then, I will start dreaming about doing better than the
> Mighty Simons. (I can dream, right?)
> 
> I figured I would start by the back-end. I began to write 15 pathetic
> lines of code to start an STG machine. Needles to say, I am stuck. I
> don't know where to start.
> 
> I am already familiar with some papers. In particular, [1] (on
> template instantiation), [2] and [3]. I once wrote a simple graph
> reducer using template instantiation (about 20 lines at most).
> 
> I have chosen the STG machine because I thought i would be easier to
> get an FFI and case exprs out of it. Maybe I am wrong, and template
> instantiation is really easier. There is also the BRISK machine, and
> GRIN. But the paper I read were less readable to me than the three
> mentioned.
> 
> So, If any of you can give me some pointer or guidance or advice about
> where to start, I would be very grateful.
> 
> Loup
> 
> 
> [1] http://www.cs.otago.ac.nz/cosc459/books/pjlester.pdf
> [2] 
> http://research.microsoft.com/~simonpj/Papers/spineless-tagless-gmachine.ps.gz
> [3] http://research.microsoft.com/~simonpj/Papers/eval-apply/index.htm
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] I want to write a compiler

2009-03-07 Thread Austin Seipp
Hi,

(Please note this is coming from my own experience working with the LHC haskell
compiler, as well as a compiler I'm currently working on in SML. I'm
not an authority, but as another greenhorn compiler hacker I thought I
might give some advice.)

Excerpts from Loup Vaillant's message of Sat Mar 07 17:32:48 -0600 2009:
> Ideally, I would very much like to compile to C.
> 
> The requirements are easily stated. My language must
> -> be lazy by default,
> -> support algebraic data types and case expressions (unless I can get
> away with encoding them as functions),
> -> have some kind of FFI with C (I suppose it imply some support for
> unboxed values).
> 
> There is also the first class functions thing, but lambda lifting is okay.

Unfortunately, after working on LHC, I can verifiably say (like all
the researchers who wrote the papers which I *didn't* read
beforehand,) that for a lot of purposes, C is an unsuitable fit for a
compilers' target language. It works pretty well if you can make the
source language execution model fit well enough, but if you can't,
you've a price to pay (both in sanity and/or performance.)

(On that note, I am currently of the opinion that most of LHC's major
deficiencies, aside from a few parser bugs or some needed
optimizations, comes from the fact that compiling to C is currently
our only option; because of it, we have no exception handling or
proper garbage collection at all. As well, the runtime system is a
little needlessly 'clever' (if small and understandable) so it can
deal with that.)

However, since your goal is *not* efficiency, you will be happy to
know that the issues with compiling to C (like garbage collection and
exception handling) can be worked around viably.

For garbage collection, please see.

"Accurate Garbage Collection in an Uncooperative Environment" -
http://citeseer.ist.psu.edu/538613.html

This strategy is currently used in Mercury as well as Ben L.'s DDC
language; on that note, I think if you spent some time looking through
the runtime/generated code of DDC, you can see exactly what the paper
is talking about, because it's actually a very simple strategy for
holding onto GC roots:

http://code.haskell.org/ddc/ddc-head/runtime/

However, it's reasons like this (C is really hard to compile to
effectively) that Simon & co. have spent so much time on the C--
project. C-- is one of the backend languages used in GHC, and it is
designed to be a 'uniform assembly language' for high level languages
to compile to.

You can find a lot of info on C-- here; I recommend the paper at the
bottom to start off:

http://research.microsoft.com/en-us/um/people/simonpj/papers/c--/

These papers should further illustrate the issues with compiling to C;
for a pedagogical excursion, these issues can all be (simply) worked
around for the most part like Henderson's paper illustrates, but
they're worth keeping in mind, too.

Hopefully those papers should help you concerning your compilation
model and the route you would like to take. I can't say much about
laziness, other than read Simon Peyton-Jones' actual full book (it's
an amazing read):

http://research.microsoft.com/en-us/um/people/simonpj/papers/slpj-book-1987/

That should help you concerning laziness/compilation etc..

As for the FFI, I don't really have any papers on 'how to implement an
FFI', but Matthias Blume's paper might be of interest:

"No-Longer-Foreign: Teaching an ML compiler to speak c "natively"" 

http://ttic.uchicago.edu/~blume/papers/nlffi-entcs.pdf

libffi may also be worth investigating:

http://sourceware.org/libffi/


> I have chosen the STG machine because I thought i would be easier to
> get an FFI and case exprs out of it. Maybe I am wrong, and template
> instantiation is really easier. There is also the BRISK machine, and
> GRIN. But the paper I read were less readable to me than the three
> mentioned.

How far into GRIN have you looked? It is one of the intermediate
languages for lhc/jhc, and for a low-level intermediate representation
it works quite well I think; it is a strict, monadic intermediate
language - being strict, we must still be able to model closures
('suspendeded applications' or thunks,) so we model closures/partial
applications in a sort of 'defunctionalized' way (making grin a sort
of 'defunctionalized intermediate language' really,) by turning them
into something along the lines of an algebraic data type in GRIN. A
good benefit of this is that you actually /don't/ have to write
eval/apply yourself, because the compiler is supposed to generate it
*for* you.

In fact, this is a core part of GRIN - the generation of eval by the
compiler is critical for many important optimizations to take
place. It also makes the runtime a bit simpler.

This comes at the price that GRIN is by design a whole-program
intermediate form; in order to pull off any of these sophisticated
transformations everything must be in memory at once. But as we have
seen with LHC/JHC, it can make a *huge* differ

Re: [Haskell-cafe] I want to write a compiler

2009-03-07 Thread wren ng thornton

Loup Vaillant wrote:

-> support algebraic data types and case expressions (unless I can get
away with encoding them as functions),


Which you always can,

data Foo = A a1...an | B b1...bn |...

==

type Foo :: forall r.
(a1->...->an -> r) ->
(b1->...->bn -> r) ->...
-> r

A a1...an = \fa _... -> fa a1...an

B b1...bn = \_ fb... -> fb b1...bn

...


The only trick is that you need to have closures (in order to build the 
Foos) and you need to have first-class functions (in order to pass the 
destructors in). If you're familiar with the STG machine, this is what 
they do under the covers anyways. At the core level, all you need is 
some primitive to force evaluation.


(Church Encoding)++

--
Live well,
~wren
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] I want to write a compiler

2009-03-08 Thread Ryan Ingram
I really like this book:

http://research.microsoft.com/en-us/um/people/simonpj/papers/pj-lester-book/

It walks you through building a compiler to a reasonable intermediate
language, which you can then start with; it's is a much easier problem
to convert that intermediate langauge to assembly, C, or LLVM.

  -- ryan

On Sat, Mar 7, 2009 at 5:20 PM, Jeremy Shaw  wrote:
> Hello,
>
> This book is pretty good IMO:
>
> http://research.microsoft.com/en-us/um/people/simonpj/papers/slpj-book-1987/
>
> It does not cover STG, but it does cover a ton of useful stuff and is
> very well written.
>
> I can say from experience that the spineless-tagless-gmachine paper
> you referenced does contain all the information needed to implement an
> STG->C compiler. But, you should expect to spend a lot of time
> alternating between trying to implement the thing and rereading the
> paper.
>
> Also, you may find that it is easier to do STG->Assembly at first (via
> harpy or something). With assembly it is easier to express exactly
> what you want to happen. With C you have to try to figure out how to
> trick the C compiler into doing your bidding.
>
> Also, maybe it is more satisfying to output assembly, because then you
> really feel like you are writing a compiler ;)
>
> I am sure others will say that LLVM is a better option than C or
> assembly. I know almost nothing about LLVM, so they may be right.
>
> -- jeremy
>
> At Sun, 8 Mar 2009 01:32:48 +0200,
> Loup Vaillant wrote:
>>
>> This is a homework question. I am mainly looking for guidance or
>> pointers. Source code, even. (Not GHC's, though, it is too complicated
>> for me right now. The AST of STG is fine, but the rest kinda scares
>> me.)
>>
>> Ideally, I would very much like to compile to C.
>>
>> The requirements are easily stated. My language must
>> -> be lazy by default,
>> -> support algebraic data types and case expressions (unless I can get
>> away with encoding them as functions),
>> -> have some kind of FFI with C (I suppose it imply some support for
>> unboxed values).
>>
>> There is also the first class functions thing, but lambda lifting is okay.
>>
>> I don't care about any kind of execution efficiency. I just want my
>> compiler to be As Simple As Possible. Eventually, I intend to
>> bootstrap. Then, I will start dreaming about doing better than the
>> Mighty Simons. (I can dream, right?)
>>
>> I figured I would start by the back-end. I began to write 15 pathetic
>> lines of code to start an STG machine. Needles to say, I am stuck. I
>> don't know where to start.
>>
>> I am already familiar with some papers. In particular, [1] (on
>> template instantiation), [2] and [3]. I once wrote a simple graph
>> reducer using template instantiation (about 20 lines at most).
>>
>> I have chosen the STG machine because I thought i would be easier to
>> get an FFI and case exprs out of it. Maybe I am wrong, and template
>> instantiation is really easier. There is also the BRISK machine, and
>> GRIN. But the paper I read were less readable to me than the three
>> mentioned.
>>
>> So, If any of you can give me some pointer or guidance or advice about
>> where to start, I would be very grateful.
>>
>> Loup
>>
>>
>> [1] http://www.cs.otago.ac.nz/cosc459/books/pjlester.pdf
>> [2] 
>> http://research.microsoft.com/~simonpj/Papers/spineless-tagless-gmachine.ps.gz
>> [3] http://research.microsoft.com/~simonpj/Papers/eval-apply/index.htm
>> ___
>> Haskell-Cafe mailing list
>> Haskell-Cafe@haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] I want to write a compiler

2009-03-08 Thread Ben Lippmeier


On 08/03/2009, at 12:45 PM, Austin Seipp wrote:


For garbage collection, please see.

"Accurate Garbage Collection in an Uncooperative Environment" -
http://citeseer.ist.psu.edu/538613.html

This strategy is currently used in Mercury as well as Ben L.'s DDC
language; on that note, I think if you spent some time looking through
the runtime/generated code of DDC, you can see exactly what the paper
is talking about, because it's actually a very simple strategy for
holding onto GC roots:

http://code.haskell.org/ddc/ddc-head/runtime/


That paper explains the basic idea, but neither DDC or Mercury quite  
follow it (I asked Zoltan). The system in the paper keeps the GC roots  
in structs on the C stack, and chains the structs together as a linked  
list. The problem is that if you take a pointer to data on the C stack  
then GCC freaks out and disables a host of optimisations. I imagine  
it's worried about pointers going bad after the stack frame is popped  
and the space for the struct gets lost.


DDC keeps a shadow stack of GC roots in malloced memory. It's only a  
small difference, but lets the C compiler produce better code.


Ben.



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] I want to write a compiler

2009-03-08 Thread Loup Vaillant
OK, thank you all for your responses. I'm impressed how fast you were.

2009/3/8 Rick R :
> http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours

I have it. However, Scheme is strict, which is the reason I didn't
complete my reading. Maybe I am wrong.

> Or go for the Real Thing:
> http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-35.html#%_sec_5.5

I just opened the page. I'll see what I can take. Thank you.


2009/3/8 Jeremy Shaw :
> I can say from experience that the spineless-tagless-gmachine paper
> you referenced does contain all the information needed to implement an
> STG->C compiler. But, you should expect to spend a lot of time
> alternating between trying to implement the thing and rereading the
> paper.

So I just didn't tried hard enough. Time for a re-read, then!

> Also, you may find that it is easier to do STG->Assembly at first (via
> harpy or something). With assembly it is easier to express exactly
> what you want to happen. With C you have to try to figure out how to
> trick the C compiler into doing your bidding.

Scary. I know only C.

> Also, maybe it is more satisfying to output assembly, because then you
> really feel like you are writing a compiler ;)

Well, maybe later.

> I am sure others will say that LLVM is a better option than C or
> assembly. I know almost nothing about LLVM, so they may be right.

Neither do I. But I definitely try this. One day.

> -- jeremy


2009/3/8 Austin Seipp :
>
> Excerpts from Loup Vaillant's message of Sat Mar 07 17:32:48 -0600 2009:
>> Ideally, I would very much like to compile to C.
>> [...]
>> -> have some kind of FFI with C (I suppose it imply some support for
>> unboxed values).
>
> Unfortunately, after working on LHC, I can verifiably say (like all
> the researchers who wrote the papers which I *didn't* read
> beforehand,) that for a lot of purposes, C is an unsuitable fit for a
> compilers' target language. It works pretty well if you can make the
> source language execution model fit well enough, but if you can't,
> you've a price to pay (both in sanity and/or performance.)

Yep.

> However, since your goal is *not* efficiency, you will be happy to
> know that the issues with compiling to C (like garbage collection and
> exception handling) can be worked around viably.

I hope so.

> For garbage collection, please see.
>
> "Accurate Garbage Collection in an Uncooperative Environment" -
> http://citeseer.ist.psu.edu/538613.html
> [...]
> http://code.haskell.org/ddc/ddc-head/runtime/

Thank you, I'll look into that.

> You can find a lot of info on C-- here; I recommend the paper at the
> bottom to start off:
>
> http://research.microsoft.com/en-us/um/people/simonpj/papers/c--/

I'll look at that. But I am not sure it is the simplest path for me: I know C,
and I don't know C-- —yet.

> http://research.microsoft.com/en-us/um/people/simonpj/papers/slpj-book-1987/

I have it. Maybe I should study it a bit more.


> As for the FFI, I don't really have any papers on 'how to implement an
> FFI', but Matthias Blume's paper might be of interest:
>
> "No-Longer-Foreign: Teaching an ML compiler to speak c "natively""
> http://ttic.uchicago.edu/~blume/papers/nlffi-entcs.pdf
>
> libffi may also be worth investigating:
> http://sourceware.org/libffi/

I didn't know them, thank you.

>> I have chosen the STG machine because I thought i would be easier to
>> get an FFI and case exprs out of it. Maybe I am wrong, and template
>> instantiation is really easier. There is also the BRISK machine, and
>> GRIN. But the paper I read were less readable to me than the three
>> mentioned.
>
> How far into GRIN have you looked?

Not far enough, it seems.

> It is one of the intermediate
> languages for lhc/jhc, and for a low-level intermediate representation
> it works quite well I think; it is a strict, monadic intermediate
> language - being strict, we must still be able to model closures
> ('suspendeded applications' or thunks,) so we model closures/partial
> applications in a sort of 'defunctionalized' way (making grin a sort
> of 'defunctionalized intermediate language' really,) by turning them
> into something along the lines of an algebraic data type in GRIN. A
> good benefit of this is that you actually /don't/ have to write
> eval/apply yourself, because the compiler is supposed to generate it
> *for* you.

Here is my biggest problem with GRIN: it is not STG by far. I formatted
my brain one way, so I find difficult to reformat it another way

> In fact, this is a core part of GRIN - the generation of eval by the
> compiler is critical for many important optimizations to take
> place. It also makes the runtime a bit simpler.

If I find GRIN is overall simpler, I am happy.

> This comes at the price that GRIN is by design a whole-program
> intermediate form; in order to pull off any of these sophisticated
> transformations everything must be in memory at once. But as we have
> seen with LHC/JHC, it can make a *huge* difference (apps that are 10x

Re: [Haskell-cafe] I want to write a compiler

2009-03-08 Thread Loup Vaillant
Thanks to the courage you all gave me, I found the strength to go on.

Template instantiation seems to be the easiest path, so I decided to
follow it. I think I solved the problem about primitives and FFI I
though it had. Namely, how the hell do I compile primitive calls with
an arbitrary number of arguments.

Starting from the Tutorial of Peyton Jones and Lester, I added a new
node type to help guide the order of evaluation. Basically, this "Seq"
node takes the address of two other nodes. It pushes them both on the
stack, like an application node would. (That is because I don't want a
dump, but I could as well create a new stack and and push the previous
one on the dump.)

Note the nodes which are evaluated this way must not have a functional
head normal form (primitive or supercombinator). That would leak past
the beginning of the stack.

When the node is finally evaluated (e.g. points to a HNF, like a boxed
integer), the topmost and second topmost pointers on the stack are
swapped. (or the topmost pointer of the stack and the topmost pointer
of the topmost stack on the dump are swapped).

With a suitable compilation scheme, that should guarantee that
primitives have their arguments already evaluated. (And avoid infinite
loops).

Now, I just have to implement it. :-)

Cheers,
Loup.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] I want to write a compiler

2009-03-09 Thread John Meacham
On Sat, Mar 07, 2009 at 07:45:06PM -0600, Austin Seipp wrote:
> (On that note, I am currently of the opinion that most of LHC's major
> deficiencies, aside from a few parser bugs or some needed
> optimizations, comes from the fact that compiling to C is currently
> our only option; because of it, we have no exception handling or
> proper garbage collection at all. As well, the runtime system is a
> little needlessly 'clever' (if small and understandable) so it can
> deal with that.)

It would be interesting if you could revive the ghc back end I wrote for
jhc in lhc. the code is still in the repository but was disabled a while
ago, and I was just fretting over whether I should just delete it from
the codebase as an interesting experiment. I mainly used it as a
debugging aid once upon a time, but it was difficult to keep up to date
with the C back end. I know it is sort of a silly back end, but it might
be interesting.

> Having dealt with GRIN as I work on LHC (when time now permits...) I
> can say that it's a reasonable strategy and in practice it turns out
> pretty well, but if you're not into optimizing the code, then STG
> might be a better fit.

I think a big deciding factor here would be the answer to one question
"do you want to deal with unboxed values in your compiler internally?"
As in, you plan on a lazy language, so, do you ever want to open up
those thunks and deal with unboxed values in your compiler guts or do
you want to treat them as abstract boxes to be evaluated by the runtime?
if you do want to think about unboxed values, for optimization or other
purposes, bite the bullet and go for something like GRIN as the back end
and support unboxed values all the way through to the front end from the
get go. If you really only want to support lazy thunks, go with one of
the quasi virtual machine style implementations like STG.

John

-- 
John Meacham - ⑆repetae.net⑆john⑈
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] I want to write a compiler

2009-03-09 Thread Austin Seipp
Excerpts from John Meacham's message of Mon Mar 09 07:28:25 -0500 2009:
> On Sat, Mar 07, 2009 at 07:45:06PM -0600, Austin Seipp wrote:
> > (On that note, I am currently of the opinion that most of LHC's major
> > deficiencies, aside from a few parser bugs or some needed
> > optimizations, comes from the fact that compiling to C is currently
> > our only option; because of it, we have no exception handling or
> > proper garbage collection at all. As well, the runtime system is a
> > little needlessly 'clever' (if small and understandable) so it can
> > deal with that.)
> 
> It would be interesting if you could revive the ghc back end I wrote for
> jhc in lhc. the code is still in the repository but was disabled a while
> ago, and I was just fretting over whether I should just delete it from
> the codebase as an interesting experiment. I mainly used it as a
> debugging aid once upon a time, but it was difficult to keep up to date
> with the C back end. I know it is sort of a silly back end, but it might
> be interesting.

Indeed, I stumbled upon it whilst looking at how unsafeCoerce worked
(to find out it is super-duper-special and implemented as part of E.)
I think it's actually pretty clever, and who knows, maybe it could be
useful as at least a debugging aid. :)

> I think a big deciding factor here would be the answer to one question
> "do you want to deal with unboxed values in your compiler internally?"
> As in, you plan on a lazy language, so, do you ever want to open up
> those thunks and deal with unboxed values in your compiler guts or do
> you want to treat them as abstract boxes to be evaluated by the runtime?
> if you do want to think about unboxed values, for optimization or other
> purposes, bite the bullet and go for something like GRIN as the back end
> and support unboxed values all the way through to the front end from the
> get go. If you really only want to support lazy thunks, go with one of
> the quasi virtual machine style implementations like STG.
> 
> John
> 

This is a very good point I hadn't even thought about! Indeed, since
GRIN represents thunks in a defunctionalized way - encoded as nodes -
dealing with boxed/unboxed values becomes more of the compiler's job,
since the nature of unboxed values etc. becomes more transparent.

Since you bring this up, I figure this decision also had some
influence on E in lhc/jhc, considering its type system is rich enough
to distinguish values in whnf/boxed/unboxed etc.?

Austin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] I want to write a compiler

2009-03-09 Thread John Meacham
On Mon, Mar 09, 2009 at 09:25:58AM -0500, Austin Seipp wrote:
> Indeed, I stumbled upon it whilst looking at how unsafeCoerce worked
> (to find out it is super-duper-special and implemented as part of E.)
> I think it's actually pretty clever, and who knows, maybe it could be
> useful as at least a debugging aid. :)

It is actually  not all that special, it is just a primitive like any
other than transates to a nop, or no code. Since the optimizer can't
know what is inside a primitive in general, it leaves it alone and the
only affect is the type coercion implicit in its signature. When
translating to Grin, the unsafeCoerces are simply dropped. Coercions
used to be more complicated because once upon a time I required them to
be used to translate to/from newtype declared synonyms, I found they
inhibited optimizations too much so switched to the current mechanism
whereby newtype expansion happens transparently in the type checker when
needed. As a side effect I worked hard to try to recognize and optimize
around coercions, but now that they occur fairly rarely, it isn't much
of an issue any more.

> This is a very good point I hadn't even thought about! Indeed, since
> GRIN represents thunks in a defunctionalized way - encoded as nodes -
> dealing with boxed/unboxed values becomes more of the compiler's job,
> since the nature of unboxed values etc. becomes more transparent.

The main reason I think it is important is because I feel not including
unboxed values from the get-go was one of the biggest false starts I had
when writing jhc. When I initially started writing it I considered
unboxed values as a feature to be added later. When I started
implementing them, I found that there were a lot of decisions I would
have made differently and a lot of places they would have greatly
simplified my initial design had I had them from the start. Namely, A
whole host of 'compiler magic' could have gone away if I implemented
all primitive types in pure haskell like I implement floats and chars now,

data Float = Float_ Float32_ 
data Char = Char Bits32_

(where Float32_ and Bits32_ are the unboxed raw types)

Since instead I had the compiler implement Int and friends directly, it
suddenly had the responisibity to conjure up all sorts of operations on
them, such as the Num instances, creating the big mess of compiler magic
in PrimitiveOperators.hs. The more I can express in the haskell source
directly the better, it raises odd questions when the compiler has to do
things like implement Num instances directly, like the instances are
built in, but we might not have 'imported' the types needed to declare
them into the current scope, so we end up having to deal with instancse
for types that don't exist yet.

In any case, yeah. if you plan on unboxed types at some point, include
them from the start because they can simplify a lot to offset their cost
of implementation, by trying to add them on later I ended up having to
pay the cost of implementing them, but also had to deal with the cruft
left over from not utilizing them from the beginning.


> Since you bring this up, I figure this decision also had some
> influence on E in lhc/jhc, considering its type system is rich enough
> to distinguish values in whnf/boxed/unboxed etc.?

Yeah, there were a couple things that motivated this. A big one is that
evals are bad in jhc, signifigantly moreso than traditional ghc. The
points-to analysis is the main way to fight it, but I also decided I
wanted to give it the best shot possible by getting rid of as many evals
as I could in higher level optimizations, which necessitated a fairly
rich type system since it actually made sense for jhc to keep strict
track of what was in WHNF vs not even if they were both boxed. in
general ghc didn't care unless a value was either CPRable or unboxable,
but jhc can benefit from such knowledge no matter what the type, even if
it is a partially applied function. Though, it is interesting that GHC
may be moving to a jhc-style semi-tagging mechanism according to this
paper[1] so more optimizations may help both equally in the future. 

The other main motivating factor was that I was always enamoured with
the idea of jhc supporting more than one source language, in particular,
I wanted to explore it having an SML front end, so wanted a core type
system that was equally good at expressing lazy haskell as strict ML,
and more importantly would be able to seamlessly intermingle the two
without difficulty. So, when designing it, I kept that in mind, "is this
equally useful as a ML compilers core?" or a "wacky ML/Haskell hybrid?".
Although the ML front end is more of a thought experiment at this point,
I think it helped me come up with a coherent type system that had
intruiging possibilies for haskell optimizations, as well as some
language implications, such as the ability to declare new 'strict'
algebraic types that may lead to new ideas for language extensions.

John

[1]
http://research.microsoft.c