subject:"\"\\\\\\\[music\\\\\\\-dsp\\\\\\\] efficient running max algorithm \\\\\\\(Evan Balster\\\\\\\)\""

Re: [music-dsp] efficient running max algorithm (Evan Balster)

2016-07-25 Thread Ethan Fenn

> - IIUC, none the other algos mentioned can cheaply vary the window size
>  at run-time, right?

I don't think what I've been calling "the Tito method" can cheaply vary the
window size, unless I've missed something clever. Or unless you're willing
to accept a "crossover period" during which the results aren't strictly
correct.

The Lemire method (assuming your operator is something like min or max) can
be made to do it cheaply if you store the indices of the potential maxima
along with their values. If you do this, changing the window size is an
O(log(w)) operation. This means you'll lose the O(1) average if the window
size is changing every sample, but I'd guess it would be faster than your
method in the end.

> - None of them generalize for other binary operators instead of max,
>  right?  (mine works for any operator that is associative and
>  commutative)

I think that's correct. The Tito method will work if you add one more
condition on the operator -- that every value has an inverse. The Lemire
method, and the heap- or tree-based methods, are pretty specific to max
and/or min.

> - Can I further optimise this algo by doing a linear search for the
>  first log2(w) samples?  Why? How?

No, this idea only makes sense if you're storing a sorted array of
potential maxima, like in the Lemire method. In your method there's not
really an array that you would want to search through!

-Ethan



On Mon, Jul 25, 2016 at 1:06 PM, Tito Latini  wrote:

> > Hi, OP here.
> >
> >
> > Unfortunately I don't understand all you are saying, due to my lack of C
> > and Z-transform skills.
> >
> > This is what I understood, please correct me if I'm wrong:
> > - My algo is called a binary max-tree.  (When used with max as an
> >  operator)
> > - The other algo discussed is faster on average, but has spikes that are
> >  higher.
> >
> >
> > Some questions:
> >
> > - IIUC, none the other algos mentioned can cheaply vary the window size
> >  at run-time, right?
>
> Your algo is not cheaper: the performance-time loop of your compiled
> slidingsum.dsp contains two for-loops (from compute() in slidingsum.cpp):
>
> for (int i=14; i>0; i--) fVec3[i] = fVec3[i-1];
> for (int i=6; i>0; i--) fVec2[i] = fVec2[i-1];
>
> and maxsize is 8 (maybe the loops are unrolled in this simple case but
> it's not always Sunday).
>
> A simple implementation of my suggested double buffering (the alternative
> and better modulation for the initial posted code) uses two loops, one
> "from i to oldsize-1" (max 7 in this case) and one "from 0 to i-1" to
> update max 8 slots instead of 15+7=22 (the numbers are just emphasis
> and the loops are possibly unrolled in this case). Besides, the two loops
> are used only if the winsize changes and it is minor than the old size.
> It is cheaper than your generated code, also by moving the two loops within
> the performance-time loop.
>
> > - None of them generalize for other binary operators instead of max,
> >  right?  (mine works for any operator that is associative and
> >  commutative)
>
> Nope. I'll show "max" for simplicity but the next example should work in
> similar contexts. We can use a circular buffer, a binary max heap and
> new functions to work with the heap instead of add/subtract to/from
> the accumulators:
>
> [just a simple idea, not the state of the art]
>
> a) remove the oldest from the heap
> b) insert the new to the heap
>
> ---[alternative 1]---
>
> a) on the heap: replace the oldest with the new (binary search and
> swap)
> b) on the heap: update the position of the new (binary comp)
>
> coda:
>
> c) update the circular buffer and increment the index (modulo winsize)
> d) max is heap[0], the root
>
> Again, if winsize changes and it is minor than the old size, it is
> necessary to remove the oldest "old_size - new_size" values from the heap.
>
> ---[alternative 2]---
>
> We can avoid the binary search (a) by using a heap where the elements are
> the indexes of the buffer values and an array (or a different structure
> for the elements of the heap, i.e. index and position in uint32 if
> maxsize is two bytes) to store the positions of the indexes on the heap.
>
> For example, with two arrays of type int called "heap" and "pos":
>
> heap[k] = buffer_index;
> pos[buffer_index] = k;
>
> binary-search-and-swap is unnecessary because we know the index of the
> oldest value on the heap:
>
> heap[pos[curr_buffer_index]]
>
> and we use that heap position to start the binary comp with the new
> value because the buffer index is the same (it is incremented after
> the insertion).
>
> The max value is:
>
> buffer[heap[0]]
>
> > - Can I further optimise this algo by doing a linear search for the
> >  first log2(w) samples?  Why? How?
>
> Where? :)
> ___
> dupswapdrop: music-dsp mailing list
> music-dsp@music.columbia.edu
> https://lists.columbia.edu/mailman/listinfo/music-dsp
>
>
__

Re: [music-dsp] efficient running max algorithm (Evan Balster)

2016-07-25 Thread Tito Latini

> Hi, OP here.
>
>
> Unfortunately I don't understand all you are saying, due to my lack of C
> and Z-transform skills.
>
> This is what I understood, please correct me if I'm wrong:
> - My algo is called a binary max-tree.  (When used with max as an
>  operator)
> - The other algo discussed is faster on average, but has spikes that are
>  higher.
>
>
> Some questions:
>
> - IIUC, none the other algos mentioned can cheaply vary the window size
>  at run-time, right?

Your algo is not cheaper: the performance-time loop of your compiled
slidingsum.dsp contains two for-loops (from compute() in slidingsum.cpp):

for (int i=14; i>0; i--) fVec3[i] = fVec3[i-1];
for (int i=6; i>0; i--) fVec2[i] = fVec2[i-1];

and maxsize is 8 (maybe the loops are unrolled in this simple case but
it's not always Sunday).

A simple implementation of my suggested double buffering (the alternative
and better modulation for the initial posted code) uses two loops, one
"from i to oldsize-1" (max 7 in this case) and one "from 0 to i-1" to
update max 8 slots instead of 15+7=22 (the numbers are just emphasis
and the loops are possibly unrolled in this case). Besides, the two loops
are used only if the winsize changes and it is minor than the old size.
It is cheaper than your generated code, also by moving the two loops within
the performance-time loop.

> - None of them generalize for other binary operators instead of max,
>  right?  (mine works for any operator that is associative and
>  commutative)

Nope. I'll show "max" for simplicity but the next example should work in
similar contexts. We can use a circular buffer, a binary max heap and
new functions to work with the heap instead of add/subtract to/from
the accumulators:

[just a simple idea, not the state of the art]

a) remove the oldest from the heap
b) insert the new to the heap

---[alternative 1]---

a) on the heap: replace the oldest with the new (binary search and swap)
b) on the heap: update the position of the new (binary comp)

coda:

c) update the circular buffer and increment the index (modulo winsize)
d) max is heap[0], the root

Again, if winsize changes and it is minor than the old size, it is
necessary to remove the oldest "old_size - new_size" values from the heap.

---[alternative 2]---

We can avoid the binary search (a) by using a heap where the elements are
the indexes of the buffer values and an array (or a different structure
for the elements of the heap, i.e. index and position in uint32 if
maxsize is two bytes) to store the positions of the indexes on the heap.

For example, with two arrays of type int called "heap" and "pos":

heap[k] = buffer_index;
pos[buffer_index] = k;

binary-search-and-swap is unnecessary because we know the index of the
oldest value on the heap:

heap[pos[curr_buffer_index]]

and we use that heap position to start the binary comp with the new
value because the buffer index is the same (it is incremented after
the insertion).

The max value is:

buffer[heap[0]]

> - Can I further optimise this algo by doing a linear search for the
>  first log2(w) samples?  Why? How?

Where? :)
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] efficient running max algorithm (Evan Balster)

2016-07-25 Thread Evan Balster

Hey, Bart ---

The complexity of the rolling-max algorithm we discussed is always O(N) to
process N samples.  Using the algorithm from the paper Ross linked, the
worst case for a single sample is O(w); with Ethan Fenn's binary-search
modification, the worst case complexity for one sample is O(log2(w)).  So
scaling the window size is fairly cheap, especially with Ethan's variant.
This algorithm can find maximum and/or minimum.

If a linear search is not performed prior to the binary search, the
amortized complexity of the rolling-max algorithm becomes something like
O(N*log2(w)).  It's a bit hard to explain why without illustration.

A max-tree will consistently take O(log2(w)) per sample in the worst case
(where samples of increasing value are supplied).  Thus the worst-case
complexity for N samples is O(N*log2*w)).

– Evan Balster
creator of imitone 

On Mon, Jul 25, 2016 at 12:18 AM, Bart Brouns  wrote:

> Hi, OP here.
>
>
> Unfortunately I don't understand all you are saying, due to my lack of C
> and Z-transform skills.
>
> This is what I understood, please correct me if I'm wrong:
> - My algo is called a binary max-tree.  (When used with max as an
>  operator)
> - The other algo discussed is faster on average, but has spikes that are
>  higher.
>
>
> Some questions:
>
> - IIUC, none the other algos mentioned can cheaply vary the window size
>  at run-time, right?
> - None of them generalize for other binary operators instead of max,
>  right?  (mine works for any operator that is associative and
>  commutative)
> - Can I further optimise this algo by doing a linear search for the
>  first log2(w) samples?  Why? How?
>
>
> Many thanks,
> Bart
> ___
> dupswapdrop: music-dsp mailing list
> music-dsp@music.columbia.edu
> https://lists.columbia.edu/mailman/listinfo/music-dsp
>
>
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] efficient running max algorithm (Evan Balster)

2016-07-25 Thread Bart Brouns


Hi, OP here.


Unfortunately I don't understand all you are saying, due to my lack of C
and Z-transform skills.

This is what I understood, please correct me if I'm wrong:
- My algo is called a binary max-tree.  (When used with max as an
 operator)
- The other algo discussed is faster on average, but has spikes that are
 higher.


Some questions:

- IIUC, none the other algos mentioned can cheaply vary the window size
 at run-time, right?
- None of them generalize for other binary operators instead of max,
 right?  (mine works for any operator that is associative and
 commutative)
- Can I further optimise this algo by doing a linear search for the
 first log2(w) samples?  Why? How?


Many thanks,
Bart
___
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] efficient running max algorithm (Evan Balster)

Re: [music-dsp] efficient running max algorithm (Evan Balster)

Re: [music-dsp] efficient running max algorithm (Evan Balster)

Re: [music-dsp] efficient running max algorithm (Evan Balster)

4 matches

Site Navigation

Mail list logo

Footer information