I agree with the general point, but is it really executed that often?
(good) APL code tends to do a lot with few operations. It would perhaps
make sense to benchmark how many comma functions are called in a typical
program?

Regards,
Elias
On 1 Jul 2014 21:25, "Juergen Sauermann" <juergen.sauerm...@t-online.de>
wrote:

> Hi David,
>
> I agree to your analysis below. The problem with "in place" modification
> like A←A,x is another one:
> the ownership of A may be unknown.
>
> Some time ago we had an optimization of A⍴B modifying B in place when A
> was ≤⍴B. That looked good
> initially but fired back when static APL values (like ⍬) were modified. So
> we had to revert the optimization.
>
> The general problem when judging an optimization is the overall effect on
> execution time. That not only
> includes the time saved when the optimization can be applied but also the
> time spent when it can't.
>
> For example, detecting a pattern like A←A,x costs (not very much, but) a
> little. Unfortunately that little extra
> cost is executed very often. So if we do such an optimization that the
> code may get faster (if the time saved
> by the optimization is larger than the extra cost), but could also get
> slower (if A←A,x is rarely optimized).
>
> And often a different APL code does the trick, for example A[IDX←IDX+1]←x.
> if A is expected to be large.
> This is used, by the way, in 10 ⎕CR. The "double the size" trick is used
> internally in GNU APL (see Simple_string.hh).
> It can be further improved by putting a lower bound on the size so that
> doubling of small vectors is avoided.
>
> So the final question is: shall we pay a price for the optimization of
> sub-optimal APL code? I tend to say "no".
>
> /// Jürgen
>
>
> On 07/01/2014 06:34 AM, David Lamkins wrote:
>
>> This is a long post. Part one is background to motivate the
>> implementation outline in part two. If you already understand order
>> analysis (i.e. the asymptotic analysis of program runtime), then you
>> already understand everything I'll say in Part 1; you can skip to Part 2.
>>
>> Part 1 - Motivation
>>
>> So long as you can work with data of definite size, APL is able to
>> allocate the correct amount of storage in time proportional to (assuming
>> that allocation must be accompanied by initialization) the size of the
>> storage. That's pretty good. (There might be OS-dependent tricks to finesse
>> the initialization time, but that's not the subject of this note...)
>>
>> If you have to work with data of indeterminate size and store that data
>> in an APL vector or array, that's where things get tricky.
>>
>> The classic example of this is accumulating data elements from an input
>> stream of indeterminate length:
>>
>> 0.Initialize an empty vector; this will, while the program is running,
>> accumulate data elements.
>> 1. If there's no more data to read, we're done. The vector contains the
>> result.
>> 2. Read a data element from the input stream and append the element to
>> the vector.
>> 3. Go to step 1.
>>
>> As this program runs, the vector will take on lengths in the series 0, 1,
>> 2, 3, 4, 5, ... Each time the vector expands, all of the data must be
>> copied to the new vector before appending the next element. If you look at
>> copying the data as the "cost" of this algorithm, you'll see that this is a
>> really inefficient way to do things. The runtime is O(n^2). In other works,
>> the runtime grows as the square of the input size.
>>
>> One way to optimize this program is to extend the vector by a fixed size
>> every time you need more space for the next element. IOW, rather than
>> making space for one more element, you'd make space for a hundred or a
>> thousand elements; by doing so, you reduce the number of times you must
>> copy all the existing data to an expanded vector. This seems like a smart
>> approach. It definitely gives you better performance for a data set of a
>> given size. However, looking at the asymptotic limit, this is still an
>> O(n^2) algorithm.
>>
>> So how do you improve the runtime of this algorithm...? A common approach
>> is to double the size of the vector each time you run out of space. The
>> size of the vector will increase pretty rapidly. Intuitively, you're giving
>> yourself twice as much time to run a linear algorithm each time you run out
>> of space for a new element. The asymptotic runtime won't be linear (you'll
>> still have to stop and copy at increasingly infrequent intervals), but
>> it'll be better than O(n^2).
>>
>> What's the downside? Your program will eat more memory than it really
>> needs. In fact, depending upon where you run out of input, the program may
>> have allocated up to twice as much memory as it really needs.
>>
>> Here's where I argue that trading memory for time is a good thing...
>> Memory is cheap. If you're going to be solving a problme that taxes
>> physical memory on a modern computer, you (or your employer) can almost
>> certainly afford to double your machine's memory. Also, memory is slow. A
>> modern processor can execute lots and lots of cycles in the time it takes
>> to read or write one word of data in memory. This is why copies dominate
>> the runtime of many algorithms.
>>
>> The trick in making this "double the size before copying" algorithm is to
>> somehow manage that overallocated space. That the subject of Part 2.
>>
>>
>>
>> Part 2 - Double, Copy, Trim
>>
>> I haven't yet looked at how all this might fit into GNU APL. Some of my
>> assumptions may be wrong. This section is intended to provoke discussion,
>> not to serve as a design sketch.
>>
>> The "double the size before copying" part of the algorithm seems simple
>> enough. (Keep in mind that I'll use this phrase repeatedly without also
>> mentioning the necessary test for having run out of allocated-but-unused
>> space.) The only tricky parts are (a) knowing when to do it, (b) using a
>> new bit of header information that keeps track of allocated size as opposed
>> to actual size, and (c) handling the new case of mutating an existing
>> variable (specifically, doing an update-in-place over a previously
>> allocated-but-unused element) rather than allocating a fresh copy of a
>> precise size and copying the entire result.
>>
>> Why is "knowing when to do it" tricky? I'm assuming that GNU APL
>> evaluates an expression of the form a←a,x by computing the size of a,x
>> before allocating a new a and copying all the data. We could blindly double
>> the size of a each time we need to make more room; at the worst, our
>> programs would use about twice as much memory. But we might be able to do
>> better...
>>
>> What we need is an inexpensive way of predicting that a vector may be
>> subject to repeated expansion.
>>
>> We could use static analysis of the program text: Every time we see an
>> expression of the form a←a,... we could choose the "double the size before
>> copying" technique. We might even strengthen the test by confirming that
>> the assignment appears inside a loop.
>>
>> Alternatively, we might somehow look at access patterns. If the variable
>> is infrequently updated (this assumes that we can have a timestamp
>> associated with the variable's header), then we might take a chance and do
>> minimal (i.e. just enough space for the new element) reallocation. On the
>> other hand, if we're updating that variable frequently (as we'd do in a
>> tight loop), then it'd make sense to apply the "double the size before
>> copying" technique.
>>
>> That's about all I can say for now about the double and copy parts. What
>> about trim?
>>
>> Let's assume that we'd like to do better than to potentially double the
>> runtime size of our GNU APL programs. What techniques might we use to
>> mitigate that memory growth.
>>
>> We might set an upper bound on "overhead" memory allocation. (In this
>> case, the overhead is the allocated-but-unused space consumed by reserving
>> unused space in variables that might be subject to growth. That seems like
>> an attractive approach until you consider that whatever threshold you
>> select at which to stop reserving extra space, you could reach that
>> threshold before hitting the one part of your program that's really going
>> to benefit from the optimization. We really need to make this work without
>> having to tweak interpreter parameters on a per-program and per-run basis.
>>
>> One possible approach might be to "garbage-collect" reserved space upon
>> some trigger condition. (Maybe memory pressure, maybe time, maybe something
>> else...) The downside is that this can fragment the heap. On the other
>> hand, we might get away with it thanks to VM and locality of access; it may
>> take some experiments to work out the best approach.
>>
>> Another possible approach might be to dynamically prioritize which
>> variables are subject to reservation of extra space upon expansion, based
>> upon frequency of updates or how recently the variable was updated. Again,
>> experimentation may be necessary.
>>
>> Finally, the same "double the size before copying" technique should be
>> applied to array assignments (e.g. a[m]←a[m],x).
>>
>>
>>
>> Appendix
>>
>> The attached test.apl file contains two functions: test1 and test2. (I'm
>> tempted to break into Dr. Seuss rhyme, but I'll refrain...)
>>
>> The test1 function expands a vector by one element each time an element
>> is added.
>>
>> The test2 function doubles the size of the vector each time the current
>> allocation fills up.
>>
>> For small N, you'd be hard pressed to notice a difference between the
>> runtime of test1 and test2. For example, try
>>
>>       test1 5
>>       test2 5
>>       test1 50
>>       test2 50
>>       ... etc
>>
>> At some point, you'll see a dramatic difference. On my machine, there's a
>> noticeable difference at 5000. At 50000, the difference is stunning.
>>
>> --
>> "The secret to creativity is knowing how to hide your sources."
>>    Albert Einstein
>>
>>
>> http://soundcloud.com/davidlamkins
>> http://reverbnation.com/lamkins
>> http://reverbnation.com/lcw
>> http://lamkins-guitar.com/
>> http://lamkins.net/
>> http://successful-lisp.com/
>>
>
>
>

Reply via email to