Re: [Numpy-discussion] Fast sizes for FFT

2014-12-24 Thread Julian Taylor
I still have the plan to add this function as public api to numpy's fft
helper functions, though I didn't get to it yet.
Its a relative simple task if someone wants to contribute.

On 24.12.2014 04:33, Robert McGibbon wrote:
> Alex Griffing pointed out on github that this feature was recently added
> to scipy in https://github.com/scipy/scipy/pull/3144. Sweet!
> 
> -Robert
> 
> On Tue, Dec 23, 2014 at 6:47 PM, Charles R Harris
> mailto:charlesr.har...@gmail.com>> wrote:
> 
> 
> 
> On Tue, Dec 23, 2014 at 7:32 PM, Robert McGibbon  > wrote:
> 
> Hey,
> 
> The performance of fftpack depends very strongly on the array
> size -- sizes that are powers of two are good, but also powers
> of three, five and seven, or numbers whose only prime factors
> are from (2,3,5,7). For problems that can use padding, rounding
> up the size (using np.fft.fft(x, n=size_with_padding)) to one of
> these multiples makes a big difference.
> 
> Some other packages expose a function for calculating the next
> fast size, e.g: http://ltfat.sourceforge.net/notes/ltfatnote017.pdf.
> 
> Is there anything like this in numpy/scipy? If not, would this
> be a reasonable feature to add?
> 
> 
> It would be nice to have, but an integrated system would combine it
> with padding and windowing. Might be worth putting together a
> package, somewhat like seaborn for plotting, that provides  a nicer
> interface to the fft module. Tracking downsampling/upsampling  and
> units would also be useful. I don't know if anyone has done
> something like that already...
> 
> Chuck
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org 
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast sizes for FFT

2014-12-24 Thread Sturla Molden
On 24/12/14 04:33, Robert McGibbon wrote:
> Alex Griffing pointed out on github that this feature was recently added
> to scipy in https://github.com/scipy/scipy/pull/3144. Sweet!


I use different padsize search than the one in SciPy. It would be 
interesting to see which is faster.


from numpy cimport intp_t

cdef intp_t checksize(intp_t n):
 while not (n % 5): n /= 5
 while not (n % 3): n /= 3
 while not (n % 2): n /= 2
 return (1 if n == 1 else 0)

def _next_regular(target):
 cdef intp_t n = target
 while not checksize(n):
 n += 1
 return n



Sturla


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast sizes for FFT

2014-12-24 Thread Sturla Molden
On 24/12/14 13:07, Sturla Molden wrote:

> v
>
> cdef intp_t checksize(intp_t n):
>   while not (n % 5): n /= 5
>   while not (n % 3): n /= 3
>   while not (n % 2): n /= 2
>   return (1 if n == 1 else 0)
>
> def _next_regular(target):
>   cdef intp_t n = target
>   while not checksize(n):
>   n += 1
>   return n


Blah, old code, with current Cython this should be:


from numpy cimport intp_t
cimport cython

@cython.cdivision(True)
cdef intp_t checksize(intp_t n):
  while not (n % 5): n //= 5
  while not (n % 3): n //= 3
  while not (n % 2): n //= 2
  return (1 if n == 1 else 0)

def _next_regular(target):
  cdef intp_t n = target
  while not checksize(n):
  n += 1
  return n



Sturla





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast sizes for FFT

2014-12-24 Thread Julian Taylor
On 24.12.2014 13:07, Sturla Molden wrote:
> On 24/12/14 04:33, Robert McGibbon wrote:
>> Alex Griffing pointed out on github that this feature was recently added
>> to scipy in https://github.com/scipy/scipy/pull/3144. Sweet!
> 
> 
> I use different padsize search than the one in SciPy. It would be 
> interesting to see which is faster.
> 

hm this is a brute force search, probably fast enough but slower than
scipy's code (if it also were cython)

I also ported it to C a while back, so that could be used for numpy if
speed is an issue. Code attached.

static inline unsigned long get_msb(unsigned long a)
{
/* dumb integer msb, (63 - __builtin_clzl(a)) would be faster */
unsigned long msb = 0;
while (a >>= 1)  {
msb++;
}
return msb;
}

/* ---*/
/**
 * @brief  get next 5-smooth number next to a certain value
 * @param a  lower bound of result
 *
 * 5-smooth number is a number with only prime factors 2, 3 or 5.
 * This can be used to get minimal padding sizes that still allow efficient fft
 * transforms.
 */
/* ---*/
size_t visir_get_next_regular(size_t a)
{
/* fftw can also deal efficiently with factors of 7 and 13 but it needs
 * testing that the speed of these algorithms will not cancel out the gain
 * of smaller input sizes */
if (a <= 6)
return a;
/* is already power of 2 */
if ((a & (a - 1)) == 0)
return a;
/* overflow */
if (5 > SIZE_MAX / a)
return a;

size_t match = SIZE_MAX;
size_t p5 = 1;
while(p5 < a) {
size_t p35 = p5;
while (p35 < a) {
/* ceil division */
size_t quotient = a % p35 != 0 ? a / p35 + 1 : a / p35;
/* next power of two of quotient */
size_t p2 = 2 << get_msb(quotient - 1);

size_t n = p2 * p35;
if (n == a)
return n;
else if (n < match)
match = n;

p35 *= 3;
if (p35 == a)
return p35;
}
if (p35 < match)
match = p35;
p5 *= 5;
if (p5 == a)
return p5;
}
if (p5 < match)
match = p5;
return match;
}

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast sizes for FFT

2014-12-24 Thread Sturla Molden
On 24/12/14 13:23, Julian Taylor wrote:

> hm this is a brute force search, probably fast enough but slower than
> scipy's code (if it also were cython)

That was what I tought as well when I wrote it. But it turned out that 
regular numbers are so close and abundant that was damn fast, even in 
Python :)

> I also ported it to C a while back, so that could be used for numpy if
> speed is an issue. Code attached.

Very nice :)


Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast sizes for FFT

2014-12-24 Thread Sturla Molden
On 24/12/14 04:33, Robert McGibbon wrote:

> Alex Griffing pointed out on github that this feature was recently added
> to scipy in https://github.com/scipy/scipy/pull/3144. Sweet!

I would rather have SciPy implement this with the overlap-and-add method 
rather than padding the FFT. Overlap-and-add is more memory efficient 
for large n:

- It scales as O(n) instead of O(n log n).

- For short FIR filters overlap-and-add also allows us to use small 
radix-2 FFTs.

- Small FFT size also means that we can use a small Winograd FFT instead 
of Cooley-Tukey FFT, which reduces the number of floating point 
multiplications.

- A small look-up table is also preferable as it can be kept in cache.

- Overlap-and-add is also trivial to compute in parallel. This comes at 
the expense of using more memory, but it never requires more memory than 
just taking a long FFT.


This is also interesting:

https://jakevdp.github.io/blog/2013/08/28/understanding-the-fft/



Sturla





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast sizes for FFT

2014-12-24 Thread Sturla Molden
On 24/12/14 14:34, Sturla Molden wrote:
> I would rather have SciPy implement this with the overlap-and-add method
> rather than padding the FFT. Overlap-and-add is more memory efficient
> for large n:

(eh, the list should be)


- Overlap-and-add is more memory efficient for large n.

- It scales as O(n) instead of O(n log n).

- For short FIR filters overlap-and-add also allows us to use small
radix-2 FFTs.

- Small FFT size also means that we can use a small Winograd FFT instead 
of Cooley-Tukey FFT, which reduces the number of floating point
multiplications.

- A small look-up table is also preferable as it can be kept in cache.

- Overlap-and-add is also trivial to compute in parallel. This comes at
the expense of using more memory, but it never requires more memory than 
just taking a long FFT.



Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] simple reduction question

2014-12-24 Thread Neal Becker
What would be the most efficient way to compute:

c[j] = \sum_i (a[i] * b[i,j])

where a[i] is a 1-d vector, b[i,j] is a 2-d array?

This seems to be one way:

import numpy as np
a = np.arange (3)
b = np.arange (12).reshape (3,4)
c = np.dot (a, b).sum()

but np.dot returns a vector, which then needs further reduction.  Don't know if 
there's a better way.

-- 
-- Those who don't understand recursion are doomed to repeat it

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] simple reduction question

2014-12-24 Thread Julian Taylor
On 24.12.2014 16:25, Neal Becker wrote:
> What would be the most efficient way to compute:
> 
> c[j] = \sum_i (a[i] * b[i,j])
> 
> where a[i] is a 1-d vector, b[i,j] is a 2-d array?
> 
> This seems to be one way:
> 
> import numpy as np
> a = np.arange (3)
> b = np.arange (12).reshape (3,4)
> c = np.dot (a, b).sum()
> 
> but np.dot returns a vector, which then needs further reduction.  Don't know 
> if 
> there's a better way.
> 

the formula maps nice to einsum:

np.einsum("i,ij->", a, b)

should also be reasonably efficient, but that probably depends on your
BLAS library and the sizes of the arrays.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] simple reduction question

2014-12-24 Thread Nathaniel Smith
On Wed, Dec 24, 2014 at 3:25 PM, Neal Becker  wrote:
> What would be the most efficient way to compute:
>
> c[j] = \sum_i (a[i] * b[i,j])
>
> where a[i] is a 1-d vector, b[i,j] is a 2-d array?

I think this formula is just np.dot(a, b). Did you mean c = \sum_j
\sum_i (a[i] * b[i, j])?

> This seems to be one way:
>
> import numpy as np
> a = np.arange (3)
> b = np.arange (12).reshape (3,4)
> c = np.dot (a, b).sum()
>
> but np.dot returns a vector, which then needs further reduction.  Don't know 
> if
> there's a better way.
>
> --
> -- Those who don't understand recursion are doomed to repeat it
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] simple reduction question

2014-12-24 Thread Neal Becker
Nathaniel Smith wrote:

> On Wed, Dec 24, 2014 at 3:25 PM, Neal Becker  wrote:
>> What would be the most efficient way to compute:
>>
>> c[j] = \sum_i (a[i] * b[i,j])
>>
>> where a[i] is a 1-d vector, b[i,j] is a 2-d array?
> 
> I think this formula is just np.dot(a, b). Did you mean c = \sum_j
> \sum_i (a[i] * b[i, j])?
> 
>> This seems to be one way:
>>
>> import numpy as np
>> a = np.arange (3)
>> b = np.arange (12).reshape (3,4)
>> c = np.dot (a, b).sum()
>>
>> but np.dot returns a vector, which then needs further reduction.  Don't know
>> if there's a better way.
>>
>> --
Sorry, I was a bit confused there.  Actually, c = np.dot(a, b) was just what I 
needed.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast sizes for FFT

2014-12-24 Thread Luke Pfister
On Wed, Dec 24, 2014 at 7:56 AM, Sturla Molden 
wrote:

> On 24/12/14 14:34, Sturla Molden wrote:
> > I would rather have SciPy implement this with the overlap-and-add method
> > rather than padding the FFT. Overlap-and-add is more memory efficient
> > for large n:
>
> (eh, the list should be)
>
>
> - Overlap-and-add is more memory efficient for large n.
>
> - It scales as O(n) instead of O(n log n).
>
> - For short FIR filters overlap-and-add also allows us to use small
> radix-2 FFTs.
>
> - Small FFT size also means that we can use a small Winograd FFT instead
> of Cooley-Tukey FFT, which reduces the number of floating point
> multiplications.
>
> - A small look-up table is also preferable as it can be kept in cache.
>
> - Overlap-and-add is also trivial to compute in parallel. This comes at
> the expense of using more memory, but it never requires more memory than
> just taking a long FFT.
>
>
>
> Sturla
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

Overlap-add would also be a great addition for convolution.  It gives a
sizeable speedup when convolving a short filter with a long signal.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast sizes for FFT

2014-12-24 Thread RayS

At 06:47 PM 12/23/2014, you wrote:
The performance of fftpack depends very strongly on the array size 
-- sizes that are powers of two are good, but also powers of three, 
five and seven, or numbers whose only prime factors are from 
(2,3,5,7). For problems that can use padding, rounding up the size 
(using np.fft.fft(x, n=size_with_padding)) to one of these multiples 
makes a big difference.


Checking some of my old code, we had typically done:
N2 = 2**int(math.ceil(math.log(N,2)))
and then something like
abs(rfft(data[minX:maxX, ch]*weightingArray, 
N2))[1:numpy.floor(N2/2)+1] * norm_factor


In the PDF linked, I can see where N % 3,5,7 could really help; we 
just don't do giant FFTs (>2**22) often. However, users would often 
see lng waits if/when they selected prime N on a data plot and 
asked for the full FFT without padding enabled - like ~5 seconds on a Core 2.


When released we'll certainly use the new functionality.

- Ray ___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] simple reduction question

2014-12-24 Thread josef.pktd
On Wed, Dec 24, 2014 at 10:30 AM, Julian Taylor <
jtaylor.deb...@googlemail.com> wrote:

> On 24.12.2014 16:25, Neal Becker wrote:
> > What would be the most efficient way to compute:
> >
> > c[j] = \sum_i (a[i] * b[i,j])
> >
> > where a[i] is a 1-d vector, b[i,j] is a 2-d array?
> >
> > This seems to be one way:
> >
> > import numpy as np
> > a = np.arange (3)
> > b = np.arange (12).reshape (3,4)
> > c = np.dot (a, b).sum()
> >
> > but np.dot returns a vector, which then needs further reduction.  Don't
> know if
> > there's a better way.
> >
>
> the formula maps nice to einsum:
>
> np.einsum("i,ij->", a, b)
>
> should also be reasonably efficient, but that probably depends on your
> BLAS library and the sizes of the arrays.
>

hijacking a bit since I was just trying to replicate various
multidimensional dot products with einsum

Are the older timings for einsum still a useful guide?

e.g.
http://stackoverflow.com/questions/14758283/is-there-a-numpy-scipy-dot-product-calculating-only-the-diagonal-entries-of-the

I didn't pay a lot of attention to the einsum changes, since I haven't
really used it yet.

Josef
X V X.T but vectorized



> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] simple reduction question

2014-12-24 Thread Charles R Harris
On Wed, Dec 24, 2014 at 10:29 AM,  wrote:

>
>
> On Wed, Dec 24, 2014 at 10:30 AM, Julian Taylor <
> jtaylor.deb...@googlemail.com> wrote:
>
>> On 24.12.2014 16:25, Neal Becker wrote:
>> > What would be the most efficient way to compute:
>> >
>> > c[j] = \sum_i (a[i] * b[i,j])
>> >
>> > where a[i] is a 1-d vector, b[i,j] is a 2-d array?
>> >
>> > This seems to be one way:
>> >
>> > import numpy as np
>> > a = np.arange (3)
>> > b = np.arange (12).reshape (3,4)
>> > c = np.dot (a, b).sum()
>> >
>> > but np.dot returns a vector, which then needs further reduction.  Don't
>> know if
>> > there's a better way.
>> >
>>
>> the formula maps nice to einsum:
>>
>> np.einsum("i,ij->", a, b)
>>
>> should also be reasonably efficient, but that probably depends on your
>> BLAS library and the sizes of the arrays.
>>
>
> hijacking a bit since I was just trying to replicate various
> multidimensional dot products with einsum
>
> Are the older timings for einsum still a useful guide?
>
> e.g.
>
> http://stackoverflow.com/questions/14758283/is-there-a-numpy-scipy-dot-product-calculating-only-the-diagonal-entries-of-the
>
> I didn't pay a lot of attention to the einsum changes, since I haven't
> really used it yet.
>
>
It is quite a bit slower for dot products, but very convenient for stacked
arrays, vectors, and other such things that are complicated to do with dot
products. I find the extra execution time negligible in relation to the
savings in programming effort, but the tradeoff might be different for a
library.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion