Re: [Bug-apl] Some thoughts regarding optimization of memory allocation

Elias Mårtenson Mon, 30 Jun 2014 23:22:24 -0700

This is an interesting approach, to have the interpreter automatically
apply the typical memory allocation optimisations rather than you having to
do it yourself.


That said, I think it needs to be a bit more clever. There are many
occurrences where one appends a single element to a large array. If all of
those cause a doubling of memory, the effect can be quite serious.

That said, if the algorithm only kicks in for arrays larger than 𝓁, and
the increment is made by a factor 𝒇, where both 𝓁 is on the order of,
say, 30 and 𝒇 perhaps 1.5 (configurable through a quad-parameter of
course) then it will be easy to experiment to come up with good numbers for
a given workload.

Regards,
Elias


On 1 July 2014 13:44, David B. Lamkins <dlamk...@gmail.com> wrote:

> Again: proofread, *then* post... Sorry. :(
>
>  - To clarify: Part 2 is not really an implementation outline, but
> rather a discussion of techniques.
>
>  - The test.apl file test2 function doesn't implement "double and copy",
> but rather "preallocate all". This is a much simpler case. Here's the
> "double and copy" function:
>
> ⍝ test3 starts with an empty vector and doubles its size every time we
> ⍝ need more space. I think that this'll be somewhat worse than O(n)
> ⍝ but better than O(n^2).
>
> ∇test3 n
>   x←n⍴''
>   i←0
>   a←0
>  loop:
>   →(i≥n)/0
>   i←i+1
>   →(i≤a)/mutate
>   x←x,(⍴x)⍴''
>   a←⍴x
>  mutate:
>   x[i]←'x'
>   →loop
> ∇
>
> Informally (i.e. I didn't use a stopwatch or timing function) test3 is
> about 2/3 as fast as test2 with N=5000000. Keep in mind that there are a
> few more instructions in the test3 loop.
>
>
>
> On Mon, 2014-06-30 at 21:34 -0700, David Lamkins wrote:
> > This is a long post. Part one is background to motivate the
> > implementation outline in part two. If you already understand order
> > analysis (i.e. the asymptotic analysis of program runtime), then you
> > already understand everything I'll say in Part 1; you can skip to Part
> > 2.
> >
> >
> > Part 1 - Motivation
> >
> > So long as you can work with data of definite size, APL is able to
> > allocate the correct amount of storage in time proportional to
> > (assuming that allocation must be accompanied by initialization) the
> > size of the storage. That's pretty good. (There might be OS-dependent
> > tricks to finesse the initialization time, but that's not the subject
> > of this note...)
> >
> >
> > If you have to work with data of indeterminate size and store that
> > data in an APL vector or array, that's where things get tricky.
> >
> >
> > The classic example of this is accumulating data elements from an
> > input stream of indeterminate length:
> >
> >
> > 0.Initialize an empty vector; this will, while the program is running,
> > accumulate data elements.
> >
> > 1. If there's no more data to read, we're done. The vector contains
> > the result.
> >
> > 2. Read a data element from the input stream and append the element to
> > the vector.
> >
> > 3. Go to step 1.
> >
> >
> > As this program runs, the vector will take on lengths in the series 0,
> > 1, 2, 3, 4, 5, ... Each time the vector expands, all of the data must
> > be copied to the new vector before appending the next element. If you
> > look at copying the data as the "cost" of this algorithm, you'll see
> > that this is a really inefficient way to do things. The runtime is
> > O(n^2). In other works, the runtime grows as the square of the input
> > size.
> >
> >
> > One way to optimize this program is to extend the vector by a fixed
> > size every time you need more space for the next element. IOW, rather
> > than making space for one more element, you'd make space for a hundred
> > or a thousand elements; by doing so, you reduce the number of times
> > you must copy all the existing data to an expanded vector. This seems
> > like a smart approach. It definitely gives you better performance for
> > a data set of a given size. However, looking at the asymptotic limit,
> > this is still an O(n^2) algorithm.
> >
> >
> > So how do you improve the runtime of this algorithm...? A common
> > approach is to double the size of the vector each time you run out of
> > space. The size of the vector will increase pretty rapidly.
> > Intuitively, you're giving yourself twice as much time to run a linear
> > algorithm each time you run out of space for a new element. The
> > asymptotic runtime won't be linear (you'll still have to stop and copy
> > at increasingly infrequent intervals), but it'll be better than
> > O(n^2).
> >
> >
> > What's the downside? Your program will eat more memory than it really
> > needs. In fact, depending upon where you run out of input, the program
> > may have allocated up to twice as much memory as it really needs.
> >
> >
> > Here's where I argue that trading memory for time is a good thing...
> > Memory is cheap. If you're going to be solving a problme that taxes
> > physical memory on a modern computer, you (or your employer) can
> > almost certainly afford to double your machine's memory. Also, memory
> > is slow. A modern processor can execute lots and lots of cycles in the
> > time it takes to read or write one word of data in memory. This is why
> > copies dominate the runtime of many algorithms.
> >
> >
> > The trick in making this "double the size before copying" algorithm is
> > to somehow manage that overallocated space. That the subject of Part
> > 2.
> >
> >
> >
> >
> > Part 2 - Double, Copy, Trim
> >
> >
> > I haven't yet looked at how all this might fit into GNU APL. Some of
> > my assumptions may be wrong. This section is intended to provoke
> > discussion, not to serve as a design sketch.
> >
> >
> > The "double the size before copying" part of the algorithm seems
> > simple enough. (Keep in mind that I'll use this phrase repeatedly
> > without also mentioning the necessary test for having run out of
> > allocated-but-unused space.) The only tricky parts are (a) knowing
> > when to do it, (b) using a new bit of header information that keeps
> > track of allocated size as opposed to actual size, and (c) handling
> > the new case of mutating an existing variable (specifically, doing an
> > update-in-place over a previously allocated-but-unused element) rather
> > than allocating a fresh copy of a precise size and copying the entire
> > result.
> >
> >
> > Why is "knowing when to do it" tricky? I'm assuming that GNU APL
> > evaluates an expression of the form a←a,x by computing the size of a,x
> > before allocating a new a and copying all the data. We could blindly
> > double the size of a each time we need to make more room; at the
> > worst, our programs would use about twice as much memory. But we might
> > be able to do better...
> >
> >
> > What we need is an inexpensive way of predicting that a vector may be
> > subject to repeated expansion.
> >
> > We could use static analysis of the program text: Every time we see an
> > expression of the form a←a,... we could choose the "double the size
> > before copying" technique. We might even strengthen the test by
> > confirming that the assignment appears inside a loop.
> >
> >
> > Alternatively, we might somehow look at access patterns. If the
> > variable is infrequently updated (this assumes that we can have a
> > timestamp associated with the variable's header), then we might take a
> > chance and do minimal (i.e. just enough space for the new element)
> > reallocation. On the other hand, if we're updating that variable
> > frequently (as we'd do in a tight loop), then it'd make sense to apply
> > the "double the size before copying" technique.
> >
> >
> > That's about all I can say for now about the double and copy parts.
> > What about trim?
> >
> >
> > Let's assume that we'd like to do better than to potentially double
> > the runtime size of our GNU APL programs. What techniques might we use
> > to mitigate that memory growth.
> >
> >
> > We might set an upper bound on "overhead" memory allocation. (In this
> > case, the overhead is the allocated-but-unused space consumed by
> > reserving unused space in variables that might be subject to growth.
> > That seems like an attractive approach until you consider that
> > whatever threshold you select at which to stop reserving extra space,
> > you could reach that threshold before hitting the one part of your
> > program that's really going to benefit from the optimization. We
> > really need to make this work without having to tweak interpreter
> > parameters on a per-program and per-run basis.
> >
> >
> > One possible approach might be to "garbage-collect" reserved space
> > upon some trigger condition. (Maybe memory pressure, maybe time, maybe
> > something else...) The downside is that this can fragment the heap. On
> > the other hand, we might get away with it thanks to VM and locality of
> > access; it may take some experiments to work out the best approach.
> >
> >
> > Another possible approach might be to dynamically prioritize which
> > variables are subject to reservation of extra space upon expansion,
> > based upon frequency of updates or how recently the variable was
> > updated. Again, experimentation may be necessary.
> >
> >
> > Finally, the same "double the size before copying" technique should be
> > applied to array assignments (e.g. a[m]←a[m],x).
> >
> >
> >
> >
> > Appendix
> >
> >
> > The attached test.apl file contains two functions: test1 and test2.
> > (I'm tempted to break into Dr. Seuss rhyme, but I'll refrain...)
> >
> >
> > The test1 function expands a vector by one element each time an
> > element is added.
> >
> > The test2 function doubles the size of the vector each time the
> > current allocation fills up.
> >
> >
> > For small N, you'd be hard pressed to notice a difference between the
> > runtime of test1 and test2. For example, try
> >
> >
> >       test1 5
> >
> >       test2 5
> >
> >       test1 50
> >
> >       test2 50
> >
> >       ... etc
> >
> >
> >
> > At some point, you'll see a dramatic difference. On my machine,
> > there's a noticeable difference at 5000. At 50000, the difference is
> > stunning.
> >
> >
> > --
> > "The secret to creativity is knowing how to hide your sources."
> >    Albert Einstein
> >
> >
> > http://soundcloud.com/davidlamkins
> > http://reverbnation.com/lamkins
> > http://reverbnation.com/lcw
> > http://lamkins-guitar.com/
> > http://lamkins.net/
> > http://successful-lisp.com/
>
>
>
>

Re: [Bug-apl] Some thoughts regarding optimization of memory allocation

Reply via email to