[Bug-apl] Some thoughts regarding optimization of memory allocation

David Lamkins Mon, 30 Jun 2014 21:36:08 -0700

This is a long post. Part one is background to motivate the implementation
outline in part two. If you already understand order analysis (i.e. the
asymptotic analysis of program runtime), then you already understand
everything I'll say in Part 1; you can skip to Part 2.


Part 1 - Motivation

So long as you can work with data of definite size, APL is able to allocate
the correct amount of storage in time proportional to (assuming that
allocation must be accompanied by initialization) the size of the storage.
That's pretty good. (There might be OS-dependent tricks to finesse the
initialization time, but that's not the subject of this note...)

If you have to work with data of indeterminate size and store that data in
an APL vector or array, that's where things get tricky.

The classic example of this is accumulating data elements from an input
stream of indeterminate length:

0.Initialize an empty vector; this will, while the program is running,
accumulate data elements.
1. If there's no more data to read, we're done. The vector contains the
result.
2. Read a data element from the input stream and append the element to the
vector.
3. Go to step 1.

As this program runs, the vector will take on lengths in the series 0, 1,
2, 3, 4, 5, ... Each time the vector expands, all of the data must be
copied to the new vector before appending the next element. If you look at
copying the data as the "cost" of this algorithm, you'll see that this is a
really inefficient way to do things. The runtime is O(n^2). In other works,
the runtime grows as the square of the input size.

One way to optimize this program is to extend the vector by a fixed size
every time you need more space for the next element. IOW, rather than
making space for one more element, you'd make space for a hundred or a
thousand elements; by doing so, you reduce the number of times you must
copy all the existing data to an expanded vector. This seems like a smart
approach. It definitely gives you better performance for a data set of a
given size. However, looking at the asymptotic limit, this is still an
O(n^2) algorithm.

So how do you improve the runtime of this algorithm...? A common approach
is to double the size of the vector each time you run out of space. The
size of the vector will increase pretty rapidly. Intuitively, you're giving
yourself twice as much time to run a linear algorithm each time you run out
of space for a new element. The asymptotic runtime won't be linear (you'll
still have to stop and copy at increasingly infrequent intervals), but
it'll be better than O(n^2).

What's the downside? Your program will eat more memory than it really
needs. In fact, depending upon where you run out of input, the program may
have allocated up to twice as much memory as it really needs.

Here's where I argue that trading memory for time is a good thing... Memory
is cheap. If you're going to be solving a problme that taxes physical
memory on a modern computer, you (or your employer) can almost certainly
afford to double your machine's memory. Also, memory is slow. A modern
processor can execute lots and lots of cycles in the time it takes to read
or write one word of data in memory. This is why copies dominate the
runtime of many algorithms.

The trick in making this "double the size before copying" algorithm is to
somehow manage that overallocated space. That the subject of Part 2.



Part 2 - Double, Copy, Trim

I haven't yet looked at how all this might fit into GNU APL. Some of my
assumptions may be wrong. This section is intended to provoke discussion,
not to serve as a design sketch.

The "double the size before copying" part of the algorithm seems simple
enough. (Keep in mind that I'll use this phrase repeatedly without also
mentioning the necessary test for having run out of allocated-but-unused
space.) The only tricky parts are (a) knowing when to do it, (b) using a
new bit of header information that keeps track of allocated size as opposed
to actual size, and (c) handling the new case of mutating an existing
variable (specifically, doing an update-in-place over a previously
allocated-but-unused element) rather than allocating a fresh copy of a
precise size and copying the entire result.

Why is "knowing when to do it" tricky? I'm assuming that GNU APL evaluates
an expression of the form a←a,x by computing the size of a,x before
allocating a new a and copying all the data. We could blindly double the
size of a each time we need to make more room; at the worst, our programs
would use about twice as much memory. But we might be able to do better...

What we need is an inexpensive way of predicting that a vector may be
subject to repeated expansion.

We could use static analysis of the program text: Every time we see an
expression of the form a←a,... we could choose the "double the size before
copying" technique. We might even strengthen the test by confirming that
the assignment appears inside a loop.

Alternatively, we might somehow look at access patterns. If the variable is
infrequently updated (this assumes that we can have a timestamp associated
with the variable's header), then we might take a chance and do minimal
(i.e. just enough space for the new element) reallocation. On the other
hand, if we're updating that variable frequently (as we'd do in a tight
loop), then it'd make sense to apply the "double the size before copying"
technique.

That's about all I can say for now about the double and copy parts. What
about trim?

Let's assume that we'd like to do better than to potentially double the
runtime size of our GNU APL programs. What techniques might we use to
mitigate that memory growth.

We might set an upper bound on "overhead" memory allocation. (In this case,
the overhead is the allocated-but-unused space consumed by reserving unused
space in variables that might be subject to growth. That seems like an
attractive approach until you consider that whatever threshold you select
at which to stop reserving extra space, you could reach that threshold
before hitting the one part of your program that's really going to benefit
from the optimization. We really need to make this work without having to
tweak interpreter parameters on a per-program and per-run basis.

One possible approach might be to "garbage-collect" reserved space upon
some trigger condition. (Maybe memory pressure, maybe time, maybe something
else...) The downside is that this can fragment the heap. On the other
hand, we might get away with it thanks to VM and locality of access; it may
take some experiments to work out the best approach.

Another possible approach might be to dynamically prioritize which
variables are subject to reservation of extra space upon expansion, based
upon frequency of updates or how recently the variable was updated. Again,
experimentation may be necessary.

Finally, the same "double the size before copying" technique should be
applied to array assignments (e.g. a[m]←a[m],x).



Appendix

The attached test.apl file contains two functions: test1 and test2. (I'm
tempted to break into Dr. Seuss rhyme, but I'll refrain...)

The test1 function expands a vector by one element each time an element is
added.

The test2 function doubles the size of the vector each time the current
allocation fills up.

For small N, you'd be hard pressed to notice a difference between the
runtime of test1 and test2. For example, try

      test1 5
      test2 5
      test1 50
      test2 50
      ... etc

At some point, you'll see a dramatic difference. On my machine, there's a
noticeable difference at 5000. At 50000, the difference is stunning.

-- 
"The secret to creativity is knowing how to hide your sources."
   Albert Einstein


http://soundcloud.com/davidlamkins
http://reverbnation.com/lamkins
http://reverbnation.com/lcw
http://lamkins-guitar.com/
http://lamkins.net/
http://successful-lisp.com/

test.apl
Description: Binary data

[Bug-apl] Some thoughts regarding optimization of memory allocation

Reply via email to