Con Terzoglou <[EMAIL PROTECTED]> writes:
> Currently I am attempting to convert some of my
> Haskell programs into C programs. I am finding great
> difficulty in this. I was wondering if they were any
> hints or tips that may be of assistance. 

Interesting problem and probably very instructive to do.  (The OS
kernel hackers that I work with sometimes comment that my C code looks
funny - I like to believe this is because I also write Haskell.)

Assuming you want to do it by hand, there's three main issues to
consider:

1) Lazy vs Strict.

   You might also start by translating your programs which exploit
   laziness into programs which would work well under strict
   evaluation but are still legal Haskell programs.  (Some of the Haskell
   compilers (not Hugs) used to have flags to make everything strict.  Or
   you could translate to ML.)
   
   You might try reading "Why Functional Programming Matters"
   (http://www.md.chalmers.se/~rjmh/Papers/whyfp.html) which identifies
   some key differences between strict and lazy functional programming.
   Everyplace where Hughes identifies something good about [lazy]
   functional programming, do the opposite.

   In some cases, you won't be able to (or won't want to) eliminate
   laziness.  Your best bet is to read a book or paper about how lazy
   evaluation is implemented.  The first part of Simon Peyton Jones'
   STG paper has a good overview - your goal should be to pick the
   one that is simplest to implement/use in C not what is most
   efficient (or was most efficient when the paper was written).

     
http://research.microsoft.com/copyright/accept.asp?path=/users/simonpj/papers/spineless-tagless-gmachine.ps.gz&pub=34
   
2) Memory allocation/deallocation.

   You have to do manual memory management in C.
   Choices include:

   1) Keep careful track of when you have a pointer to an object and
      call free when it's dead.  Problem is what to do when an object
      has multiple pointers to it.  (It's also a little error prone.)

   2) Put a reference count in every object, increment the reference
      when you duplicate a pointer to an object, decrement the 
      reference when you discard a pointer.  The big problem is that
      every object in a cyclic data structure can have a reference 
      count of 1 even though there are no pointers to the structure
      as a whole.  (It's also _very_ error-prone.  Unlike (1), it is
      a nightmare to debug because it's hard to figure out which
      part of the system to blame.  The Mozilla project uses reference
      counting and has developed a cool tool for helping you track
      down reference counting errors.  The intro is pretty good too.
      http://www.mozilla.org/performance/refcnt-balancer.html)

   3) Use a conservative garbage collector such as that used in Hugs
      or the popular Boehm-Weiser collector.  This is easy but will 
      not teach you an essential skill for C programmers to learn.

   A more minor issue is allocation.  This is easy enough: for every data
   constructor, write an allocation function to build such an object.
   The translation from datatypes to C's union-of-structs is quite 
   straightforward.  Incidentally, using an object to represent [] (Nil)
   (the Haskell way) instead of using a Null pointer has a beneficial
   effect on your code - I recommend you try it.   

3) Type systems


   Translating ad-hoc polymorphism (ie type classes) can range from 
   trivial to hard depending on what kind of code you write.  At the easy
   end, you just figure out what type the function is used at and
   call the appropriate method.
   At the hard end, you have to duplicate code or tag objects with
   enough information about their type that you know how to implement
   show, ==, + (or whatever).  Various abuses of the C preprocessor
   let you semi-automate this process.


   You can translate parametric polymorphism to C quite easily by making
   sure that most things are pointers and casting.

   A useful trick which makes debugging easier though is to put all your
   casts in little inline functions like this:

     extern inline directory* castFileToDir(file* f) { return (directory*)f; }

  The advantages of using this instead of naked casts are:

  1) You'll get a type error if the argument to this function has the
     wrong type.  (This is not true of cast which will cast any 
     pointer to any other pointer without a murmur.)

  2) It's easier to tell what's going on and to grep for uses of a 
     particular "cast" or, since all such functions begin with the 
     word "cast", to find all casts in the system.

  3) When you run into debugging problems, you can turn off inlining
     (read the gcc documentation for why I used "extern inline")     
     and set a breakpoint on the cast function.  Or you can insert
     sanity checks on the casts or profiling code or ...


Hope this helps,

--
Alastair Reid        [EMAIL PROTECTED]        http://www.cs.utah.edu/~reid/

Reply via email to