Tom Hawkins wrote: ] My DSLs invariably define a datatype to capture expressions; something ] like this: ] ] data Expression ] = Add Expression Expression ] | Sub Expression Expression ] | Variable String ] | Constant Int ] deriving Eq ] The problem comes when I want to generate efficient code from an ] Expression (ie. to C or some other target language). The method I use ] invovles converting the tree of subexpressions into an acyclic graphic ] to eliminate common subexpressions. The nodes are then topologically ] ordered and assigned an instruction, or statement for each node. For ] example: ] ] let a = Add (Constant 10) (Variable "i1") ] b = Sub (Variable "i2") (Constant 2) ] c = Add a b ] The process of converting an expression tree to a graph uses either Eq ] or Ord (either derived or a custom instance) to search and build a set ] of unique nodes to be ordered for execution. In this case "a", then ] "b", then "c". The problem is expressions often have shared, ] equivalent subnodes, which dramatically grows the size of the tree. ] For example: ] ] let d = Add c c ] e = Add d d -- "e" now as 16 leaf nodes. ] ] As these trees grow in size, the equality comparison in graph ] construction quickly becomes the bottleneck for DSL compilation.
We show the design of the same sort of DSL that explicitly maintains sharing information and where node comparisons are quick, because we are comparing hashes rather than trees themselves. Our approach is assuredly safe, pure, and Haskell98 (save for disabling of the monomorphism restriction, which is done solely to avoid writing signatures). No GHC-specific behavior is relied upon. > {-# OPTIONS_GHC -fno-monomorphism-restriction #-} > > module DSL where > > import Data.IntMap as IM > import Control.Monad.State The approach is based on the final tagless representation. Here is our DSL: > class Exp repr where > constant :: Int -> repr Int > variable :: String -> repr Int > add :: repr Int -> repr Int -> repr Int > sub :: repr Int -> repr Int -> repr Int Tom Hawkins' test expressions now look as follows > a = add (constant 10) (variable "i1") > b = sub (variable "i2") (constant 2) > c = add a b > d = add c c > e = add d d which is the same as before modulo the case of the identifiers: everything is in lower case. We can show the expressions as before: showing is one way of evaluating things > newtype S t = S{unS :: String} > > instance Exp S where > constant x = S $ show x > variable x = S x > add e1 e2 = S( unS e1 ++ " + " ++ unS e2) > sub e1 e2 = S( unS e1 ++ " - " ++ unS e2) > > test_showe = unS e *DSL> test_showe "10 + i1 + i2 - 2 + 10 + i1 + i2 - 2 + 10 + i1 + i2 - 2 + 10 + i1 + i2 - 2" We can write an evaluator for the expressions > type REnv = [(String,Int)] > newtype R t = R{unR :: REnv -> t} -- A reader Monad, actually > > instance Exp R where > constant x = R $ const x > variable x = R ( \env -> maybe (error $ "no var: " ++ x) id $ > Prelude.lookup x env) > add e1 e2 = R(\env -> unR e1 env + unR e2 env) > sub e1 e2 = R(\env -> unR e1 env - unR e2 env) > test_vale = unR e [("i1",5),("i2",10)] -- 92 We stress: we are using exactly the same expression e as before. We are only evaluating it differently. The gist of the final tagless approach is to write an expression once and evaluate it many times. Now, we chose a different representation: to make sharing explicit We chose not to rely on GHC; we don't care if in (add c c), the two c are shared or copied. It is not observable in pure Haskell, and we don't care. We build our acyclic graph nevertheless. > type ExpHash = Int We stress: ACode is NOT a recursive data structure, so the comparison of ACode values takes constant time! > data ACode = AConst ExpHash | > AVar ExpHash | > AAdd ExpHash | > ASub ExpHash > deriving (Eq,Show) > data ExpMaps = ExpMaps{ hashcnt :: ExpHash, -- to generate new Hash > ctmap :: IntMap Int, > vrmap :: IntMap String, > admap :: IntMap (ACode,ACode), > sumap :: IntMap (ACode,ACode)} > deriving Show > exmap0 = ExpMaps 0 IM.empty IM.empty IM.empty IM.empty > > > newtype A t = A{unA :: State ExpMaps ACode} Granted, the following could be done far more efficiently: we need bimaps. > loookupv :: Eq v => v -> IntMap v -> Maybe Int > loookupv v = IM.foldWithKey > (\k e z -> maybe (if e == v then Just k else Nothing) (const z) z) > Nothing > > record con prj upd x = do > s <- get > maybe (do let s' = upd (s{hashcnt = succ (hashcnt s)}) > (IM.insert (hashcnt s) x (prj s)) > put s' > return (con $ hashcnt s)) > (return . con) $ loookupv x (prj s) > instance Exp A where > constant x = A(record AConst ctmap (\s e -> s{ctmap = e}) x) > variable x = A(record AVar vrmap (\s e -> s{vrmap = e}) x) > add e1 e2 = A(do > h1 <- unA e1 > h2 <- unA e2 > record AAdd admap (\s e -> s{admap = e}) (h1,h2)) > sub e1 e2 = A(do > h1 <- unA e1 > h2 <- unA e2 > record ASub sumap (\s e -> s{sumap = e}) (h1,h2)) Again, we are using the very same expression e we wrote at the very beginning: > test_sme = runState (unA e) exmap0 *DSL> test_sme (AAdd 8, ExpMaps {hashcnt = 9, ctmap = fromList [(0,10),(4,2)], vrmap = fromList [(1,"i1"),(3,"i2")], admap = fromList [(2,(AConst 0,AVar 1)),(6,(AAdd 2,ASub 5)), (7,(AAdd 6,AAdd 6)),(8,(AAdd 7,AAdd 7))], sumap = fromList [(5,(AVar 3,AConst 4))]}) We retain all the information about expression 'e'. In addition, all sharing is fully explicit. As we can see, the evaluation process finds common subexpressions automatically. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe