Lists and FP (was Re: Functions allowing either 0 or 1 parameters?)

Leon Smith Fri, 19 Feb 1999 01:19:18 -0500
Don't be shy about using lists when writing Haskell Code.  Since Haskell is
lazy, lists are often consumed as they are created, so in many cases they
do not use extra memory.  Lists really are the key to clarity AND efficiency.

These definitions are the most efficient of those proposed for the problem:

>More "elegant". That is, shorter:
>> g :: Integer
>> g = sum (map f [0..20])

>A perhaps more efficient version of the previous function.
>> g :: Integer
>> g = foldl (\sum i -> sum + (f i)) 0 [0..20]

In this particular case, pretty much any Haskell compiler will
automatically perform optimizations that will transform the first
definition into the second definition.  So, the first definition will
create the same object code as the second.  Even if the implementation
doesn't optimize it, the fact that Haskell is lazy means that there isn't a
huge difference between the two.  The following is how a implementation
would reduce our un-optimized expression:

   sum (map f [0..20])

 = foldl (+) 0 ( map f  (0 : [1..20]) )

 = foldl (+) 0 ( f 0 : (map f [1..20]) )

 = foldl (+) (0 + f 0) ( map f (1 : [2..20]) )

 = foldl (+) (0 + f 0) ( f 1 : (map f [2..20]) )

 = foldl (+) (0 + f 0 + f 1) ( map f [2..20] )

 ...

(Actually, to achieve the same efficiency, one would have to use fold'
instead of foldl in the second definition.  This is because foldl' uses
strictness annotations to improve its efficiency, and sum is actually
defined in terms of foldl'.  In our reduction sequence, this would amount
to evaluating the second argument of foldl immediately, instead of building
a expression that will be evaluated later.)

Now, doesn't the functional version express what you want to do more
concisely, clearly, and directly than its imperative counterpart?

   g = sum (map f [0..20])


   int g = 0;
   for (int i = 0, i <= 20, i++) {
     g += f(i);
   }

The various list functions (foldl, foldr, scanl, scanr, etc.)  often take
the place of looping constructs in imperative languages.  In fact, FP is so
flexible that you can write functions that will "iterate" on values in
pretty much any way desired. 

I would strongly recommend reading "Introduction to Functional Programming
using Haskell, 2nd ed." by Richard Bird.  It is a really wonderful book.  

best,
leon

P.S.  One good demonstration of lists is computing factorials.  For
example, one could write:

   fact :: Integer -> Integer
   fact 0       = 1
   fact (n + 1) = (n + 1) * fact n

Let's start Hugs and calculate 10000!:

>Main> fact 10000
>
>(127701 reductions, 195550 cells, 2 garbage collections)
>ERROR: Control stack overflow

This definition follows straightforwardly from the mathematical definition
of factorials.  However, it uses stack space proportional to n.  We could
achieve efficient tail recursion by this careful definition:  (Ignoring the
function's behavior on n < 0)
 
   fact2 :: Integer -> Integer 
   fact2  = impl 1

     where
        impl :: Integer -> Integer -> Integer
        impl x n | n <= 1    = x
                 | otherwise = impl (x*n) (n-1)

>Main> fact2 10000
>
>(240449 reductions, 409653 cells, 18 garbage collections)
>ERROR: Garbage collection fails to reclaim sufficient space

We have improved both the time and space efficiency of the definition.
However, we have run into memory problems because Haskell is lazy.
(Sometimes being lazy is good, other times it is bad!) Using strictness
annotations to solve this problem:

   fact3 :: Integer -> Integer
   fact3 = impl 1
     where 
       impl :: Integer -> Integer -> Integer
       impl x n | n <= 1    = x
                | otherwise = (impl $! (x*n)) (n-1)

>Main> fact3 10000
>284625968091705451890641321211986{Interrupted!}
>
>(190012 reductions, 47622336 cells, 590 garbage collections) 

And 124 seconds of time on my somewhat elderly PC. 

Not expecting anything better, we could try a fourth definition using lists:

  fact4  :: Integer -> Integer
  fact4 n = product [1..n]

>Main> fact4 10000
>2846259680917054518906413212119868890148{Interrupted!}
>
>(210031 reductions, 42217590 cells, 520 garbage collections)

Surprisingly, this took only 110 seconds.  After all this work, we find
that the definition that makes careful use of lists is also the most
efficient, readable, and concise.  Looking at the definition of product and
fold:

   product = foldl' (*) 1

Where foldl' is just a strict version of foldl.

In FP, there are two incredibly useful functions that deal with lists:
foldl and foldr. (Pronounced fold-left and fold-right respectively.)

   foldl           :: (a -> b -> a) -> a -> [b] -> a
   foldl f a []     = a
   foldl f a (x:xs) = foldl f (f a x) xs

   foldr           :: (b -> a -> a) -> a -> [b] -> a
   foldr f z []     = z
   foldr f z (x:xs) = f x (foldr f z xs)


We can define a function that will find the sum of a list with these


 sum = foldr (+) 0
 -- OR
 sum = foldl {+} 0


One useful way to think about foldl and foldr is that
   sum         (1 : 2 : 3 : 4 : 5 : [])
=  foldr (+) 0 (1 : (2 : (3 : (4 : (5 : [])))))
=  1 + (2 + (3 + (4 + (5 + 0))))


We are replacing (:) with (+) and [] with 0.  Thus we can see:


   foldr (:) [] xs = xs


Likewise, using our second definition of sum
    sum [1, 2, 3, 4, 5]
=  foldl (+) 0 (1 : 2 : 3 : 4 : 5 : [])
=  (((( 0 + 1 ) + 2 ) + 3 ) + 4 ) + 5 )


Here, we are replacing (:) with (+) and making the whole thing associate to
the left.  This is closely related to reversing the list and then folding
right.  In fact:

   foldr func a xs = foldl cnuf a (reverse xs)
     where cnuf y x = func x y

(cnuf is func spelled backwards, of course!)


Another way to think about it is that foldr is a construct for
stack-recursion and foldl is a construct for accumulation-recursion over
lists.
Lists and FP (was Re: Functions allowing either 0 or 1 parameters?)

Reply via email to