On Mar 8, 2007, at 12:00 PM, David Roundy wrote:

Hi all,

I was just teaching a class on minimization methods, with a focus on
conjugate gradient minimization in particular, and one of my main points was that with conjugate gradient minimization we only need three or four
arrays of size N (depending on whether we use the Fletcher-Reeves or
Polak-Ribiere variant), ... This got me thinking about one of the largest problems with writing serious numerical code in Haskell, which is that of memory
consumption and avoiding temporary variables.

I've been following this discussion with interest, as I've been looking in some detail at conjugate gradient algorithms as part of my day job, and I've spent a good deal of time thinking about exactly the problems you raise. For those following along at home, here's a sample somewhat-imperative CG algorithm (this is the simplified stripped-down version):

    for j <- seq(1#cgit) do
      q = A p
      alpha = rho / (p DOT q)
      z += alpha p
      rho0 = rho
      r -= alpha q
      rho := r DOT r
      beta = rho / rho0
      p := r + beta p
    end

Here p,q,r, and z are vectors, A is the derivative of our function (in this case a sparse symmetric positive-definite matrix, but we can really think of it as a higher-order function of type Vector->Vector) and the greek letters are scalars. The "answer" is z. In practice we'd not run a fixed number of iterations, but instead do a convergence test. All the hard work is in the line "q = A p", but the storage consumption is mostly in the surrounding code. On a parallel machine (where these sorts of programs are often run) this part of the algorithm has almost no parallelism---all those dot products and normalizations preclude it---but the A p step and the dot products themselves are of course quite parallelizable.

Sadly, many of the suggestions, while generally sound, just don't apply well to this particular use case.

* As other have noted, burying the updatable state in a monad just begs the question. The resulting code looks nothing at all like the mathematics, either, which is a big problem if you're trying to understand and maintain it (The above code is virtually isomorphic to the problem specification). I'm sure David is seeking a more- declarative version of the code than the spec from which we were working. Note that rather than embedding all the state in a special monad, we might as well be using update-in-place techniques (such as the += and -= operations you see above) with the monads we've already got; the result will at least be readable, but it will be too imperative.

* There are opportunities for loop fusion in CG algorithms---we can compute multiple dot products on the same array in a single loop--- but these have the flavor of fusing multiple consumers of a data structure into a single consumer, which I believe is still an unsolved problem in equational frameworks like foldr/build or streams. It's a bit like fusing:

   n = foldl' (+) 0 . map (const 1) $ xs
   sum_xs = foldl' (+) 0 $ xs
   sum_sq = foldl' (+) 0 . map (\x->x*x) $ xs

into:

   (n,sum_xs,sum_sq) =
      foldl' (\(a0,b0,c0) (an,bn,cn)->(a0+an, b0+bn, c0+cn)) (0,0,0) .
      map (\x->(const 1 x, x, x*x)) $ xs

which we understand how to do, but not equationally (or at least we didn't last I looked, though Andy Gill and I had both fantasized about possibilities for doing so).

None of these fusion opportunities actually save space, which makes the problem tricker still.

* The algorithm as written already tries to express minimal storage. The only question is, do +=, -=, and := update their left-hand side in place, or do we think of them (in the Haskell model of the universe) as fresh arrays, and the previous values as newly-created garbage? My challenge to fellow Haskell hackers: find a way to express this such that it doesn't look so imperative.

* Even if we do a good job with += and -=, which is what David seems to be looking for, we still have those irritating := assignments--- we'd like to throw out the old p and reuse the space in the last line. And we'd like to have one piece of storage to hold the q on each successive iteration. So even if we solve David's problem, we still haven't matched the space requirements of the imperative code.

* DiffArrays are too expensive to be acceptable here. It's not even a question of unboxing. Let's say we keep the current array in fast, unboxed storage; this lets us read its elements using a single load. Each update still needs to retrieve the old data from the array and add it to the old-versions lookup table; together these operations are much more expensive than the actual update to the current version of the table. And we need to do this even though no old versions exist! We should be able to avoid this work entirely. (And, if old versions do exist, a DiffArray is the pessimal representation for them given that we're updating the whole array).

* Linear or Uniqueness types are almost what we want. I think Josef Svenningson was the one who captured this the best: Uniqueness type *require* that the *caller* of these routines make sure that it is not sharing the data. So we need two copies of our linear algebra library---one which takes unique arrays as arguments and updates in place, and one which takes non-unique arrays and allocates. And we have to pick the right one based on context. What we want, it seems to me, is one library and a compiler which can make informed judgments.

* We could imagine tracking uniqueness dynamically at run time, using something like reference counting for all our arrays. But we need to do the reference counting precisely---this is pretty much the most inefficient way possible of tracking the storage, and doesn't play well at all with using efficient GC elsewhere in our programs. The inefficiency might be worth it for arrays, but Haskell is polymorphic and in many cases we need to treat all our data the same way.

* Finally, I'll observe that we often want to use slightly different algorithms depending upon whether we're updating in place or computing into fresh storage. Often copying the data and then updating it in place is not actually a good idea.

I'd love to hear if anyone has insights / pointers to related work on any of the issues above; I'm especially keen to learn if there's work I didn't know about on fusion of multiple traversals. In my day job with Fortress we are looking at RULES-like approaches, but they founder quickly because the kind of problems David is trying to solve are 90% of what our programmers want to do.

-Jan-Willem Maessen

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to