[Numpy-discussion] einsum slow vs (tensor)dot

2012-10-24 Thread George Nurser
Hi,

I was just looking at the einsum function.
To me, it's a really elegant and clear way of doing array operations, which
is the core of what numpy is about.
It removes the need to remember a range of functions, some of which I find
tricky (e.g. tile).

Unfortunately the present implementation seems ~ 4-6x slower than dot or
tensordot for decent size arrays.
I suspect it is because the implementation does not use blas/lapack calls.

cheers, George Nurser.

E.g. (in ipython on Mac OS X 10.6, python 2.7.3, numpy 1.6.2 from macports)
a = np.arange(60.).reshape(1500,400)
b = np.arange(24.).reshape(400,600)
c = np.arange(600)
d = np.arange(400)


%timeit np.einsum('ij,jk', a, b)

10 loops, best of 3: 156 ms per loop

%timeit np.dot(a,b)
10 loops, best of 3: 27.4 ms per loop

%timeit np.einsum('i,ij,j',d,b,c)

1000 loops, best of 3: 709 us per loop

%timeit np.dot(d,np.dot(b,c))

1 loops, best of 3: 121 us per loop


or

abig = np.arange(4800.).reshape(6,8,100)
bbig = np.arange(1920.).reshape(8,6,40)


%timeit np.einsum('ijk,jil-kl', abig, bbig)

1000 loops, best of 3: 425 us per loop

%timeit np.tensordot(abig,bbig, axes=([1,0],[0,1]))

1 loops, best of 3: 105 us per loop
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] np.linalg.lstsq with several columns all 0 = huge x ?

2012-10-24 Thread denis
Folks,
   np.linalg.lstsq of a random-uniform A 50 x 32 with 3 columns all 0
returns x[:3] 0 as expected,
but 4 columns all 0 = huge x:
lstsq (50, 32) with 4 columns all 0:
 [ -3.7e+09  -3.6e+13  -1.9e+13  -2.9e+12  7.3e-01 ...

This may be a roundoff problem, or even a Mac Altivec lapack bug,
not worth looking into. linalg.svd is ok though, odd.

Summary: if you run linalg.lstsq on big arrays,
either check max |x|
or regularize, do lstsq( vstack( A, weight * eye(dim) ),
  hstack( b, zeros(dim) ))

cheers
   -- denis

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.linalg.lstsq with several columns all 0 = huge x ?

2012-10-24 Thread josef . pktd
On Wed, Oct 24, 2012 at 1:33 PM, denis denis-bz...@t-online.de wrote:
 Folks,
np.linalg.lstsq of a random-uniform A 50 x 32 with 3 columns all 0
 returns x[:3] 0 as expected,
 but 4 columns all 0 = huge x:
 lstsq (50, 32) with 4 columns all 0:
  [ -3.7e+09  -3.6e+13  -1.9e+13  -2.9e+12  7.3e-01 ...

 This may be a roundoff problem, or even a Mac Altivec lapack bug,
 not worth looking into. linalg.svd is ok though, odd.

 Summary: if you run linalg.lstsq on big arrays,
 either check max |x|
 or regularize, do lstsq( vstack( A, weight * eye(dim) ),
   hstack( b, zeros(dim) ))

lstsq has rcond argument to do (I think) essentially the same.
might need to be increased in your example.

Josef


 cheers
-- denis

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] how to pipe into numpy arrays?

2012-10-24 Thread Michael Aye
As numpy.fromfile seems to require full file object functionalities 
like seek, I can not use it with the sys.stdin pipe.
So how could I stream a binary pipe directly into numpy?
I can imagine storing the data in a string and use StringIO but the 
files are 3.6 GB large, just the binary, and that will most likely be 
much more as a string object.
Reading binary files on disk is NOT the problem, I would like to avoid 
the temporary file if possible.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to pipe into numpy arrays?

2012-10-24 Thread Benjamin Root
On Wed, Oct 24, 2012 at 3:00 PM, Michael Aye kmichael@gmail.com wrote:

 As numpy.fromfile seems to require full file object functionalities
 like seek, I can not use it with the sys.stdin pipe.
 So how could I stream a binary pipe directly into numpy?
 I can imagine storing the data in a string and use StringIO but the
 files are 3.6 GB large, just the binary, and that will most likely be
 much more as a string object.
 Reading binary files on disk is NOT the problem, I would like to avoid
 the temporary file if possible.


I haven't tried this myself, but there is a numpy.frombuffer() function as
well.  Maybe that could be used here?

Cheers!
Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Is there a way to reset an accumulate function?

2012-10-24 Thread Cera, Tim
On Wed, Oct 24, 2012 at 4:47 AM, Robert Kern robert.k...@gmail.com wrote:

 How about this?


 def nancumsum(x):
 nans = np.isnan(x)
 x = np.array(x)
 x[nans] = 0
 reset_idx = np.zeros(len(x), dtype=int)
 reset_idx[nans] = np.arange(len(x))[nans]
 reset_idx = np.maximum.accumulate(reset_idx)
 cumsum = np.cumsum(x)
 cumsum = cumsum - cumsum[reset_idx]
 return cumsum


Thank you for putting in the time to look at this.

It doesn't work for the first group of numbers if x[0] is non-zero.  Could
perhaps concatenate a np.nan at the beginning to force a reset and adjust
the returned array to not include the dummy value...

def nancumsum(x):
x = np.concatenate(([np.nan], x))
nans = np.isnan(x)
x = np.array(x)
x[nans] = 0
reset_idx = np.zeros(len(x), dtype=int)
reset_idx[nans] = np.arange(len(x))[nans]
reset_idx = np.maximum.accumulate(reset_idx)
cumsum = np.cumsum(x)
cumsum = cumsum - cumsum[reset_idx]
return cumsum[1:]

 a
array([  4.,   1.,   2.,   0.,  18.,   5.,   6.,   0.,   8.,   9.],
dtype=float32)

If no np.nan, then 'nancumsum' and 'np.cumsum' should be the same...

 np.cumsum(a)
array([  4.,   5.,   7.,   7.,  25.,  30.,  36.,  36.,  44.,  53.],
dtype=float32)

 nancumsum(a)
array([  4.,   5.,   7.,   7.,  25.,  30.,  36.,  36.,  44.,  53.])

 a[3] = np.nan

 np.cumsum(a)
array([  4.,   5.,   7.,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
dtype=float32)

 nancumsum(a)
array([  4.,   5.,   7.,   0.,  18.,  23.,  29.,  29.,  37.,  46.])

Excellent!

Kindest regards,
Tim
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] error of install numpy on linux redhat.

2012-10-24 Thread Jack Bryan

Hi, All, 
I am trying to install numpy  from http://www.scipy.org/Download . 
by 
git clone git://github.com/numpy/numpy.git numpy
But, when I ran 
python setup.py install
I got:
SystemError: Cannot compile 'Python.h'. Perhaps you need to install 
python-dev|python-devel
Where to get python-dev ? 
I tried:
$ easy_install python-develSearching for python-develReading 
http://pypi.python.org/simple/python-devel/Couldn't find index page for 
'python-devel' (maybe misspelled?)Scanning index of all packages (this may take 
a while)Reading http://pypi.python.org/simple/No local packages or download 
links found for python-develerror: Could not find suitable distribution for 
Requirement.parse('python-devel')
and 
$ easy_install python-devSearching for python-devReading 
http://pypi.python.org/simple/python-dev/Couldn't find index page for 
'python-dev' (maybe misspelled?)Scanning index of all packages (this may take a 
while)Reading http://pypi.python.org/simple/No local packages or download links 
found for python-deverror: Could not find suitable distribution for 
Requirement.parse('python-dev')


  ___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion