Re: [Numpy-discussion] Timing array construction

2009-04-30 Thread Mark Janikas
Thanks Chris and Bruce for the further input.  I kindof like the "c_" method 
because it is still relatively speedy and easy to implement.  But, the empty 
method seems to be closest to what is actually done no matter which direction 
you go in... I.e. preallocate space and insert.  I am in the process of ripping 
all of my zip calls out.  The profile of my first set of techniques is already 
significantly better.  This whole exercise has been very enlightening, as I 
spend so much time working on speeding up my algorithms and simple things like 
this should be tackled first.  Thanks again!

MJ 

-Original Message-
From: numpy-discussion-boun...@scipy.org 
[mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Christopher Barker
Sent: Thursday, April 30, 2009 12:16 PM
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] Timing array construction

Mark Janikas wrote:
> I have a lot of array constructions in my code that use
> NUM.array([list of values])... I am going to replace it with the
> empty allocation and insertion.

It may not be worth it, depending on where list_of_values comes from/is. 
A rule of thumb may be: it's going to be slow going from a numpy array 
to a regular old python list or tuple, back to a numpy array. If your 
data is a python list already, than np.array(list) is a fine choice.


>> def useAsArray(xCoords, yCoords):
>>
>> return NUM.asarray(zip(xCoords, yCoords))

Here are some of the issues with this one:

zip unpacks two generic python sequences and then put the items into 
tuple, then puts them in a list. Essentially this:

new_list = []
for i in range(len(xCoords)):
 new_list.append((xCoords[i], yCoords[i]))


In each iteration of that loop, it's indexing into the numpy arrays, 
making a python object out of them, putting them into a tuple, and 
appending that tuple to the list, which may have to re-allocate memory a 
few times.

Then the np.array() call loops through that list, unpacks each tuple, 
examines the python object, decides what it is, and turn it into a raw 
c-type to put into the array.

whereas:

def useEmpty(xCoords, yCoords):
  out = np.empty((len(xCoords), 2), dtype=xCoords.dtype)
  out[:,0] = xCoords
  out[:,1] = yCoords
  return out

allocates an array the right size.
directly copies the data from xCoords and yCoords to it.

that's it.

You can see why it's so much faster!

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Timing array construction

2009-04-30 Thread Bruce Southey
Mark Janikas wrote:
> Thanks Eric!
>
> I have a lot of array constructions in my code that use NUM.array([list of 
> values])... I am going to replace it with the empty allocation and insertion. 
>  It is indeed twice as fast as "c_" (when it matters, I.e. N is relatively 
> large):
>
>   "c_", "empty"
> 100 0.0007, 0.0230
> 200 0.0007, 0.0002
> 400 0.0007, 0.0002
> 800 0.0020, 0.0002
> 1600 0.0009, 0.0003
> 3200 0.0010, 0.0003
> 6400 0.0013, 0.0005
> 12800 0.0058, 0.0032
>
> -Original Message-
> From: numpy-discussion-boun...@scipy.org 
> [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Eric Firing
> Sent: Wednesday, April 29, 2009 11:49 PM
> To: Discussion of Numerical Python
> Subject: Re: [Numpy-discussion] Timing array construction
>
> Mark Janikas wrote:
>   
>> Hello All,
>>
>>  
>>
>> I was exploring some different ways to concatenate arrays, and using 
>> "c_" is the fastest by far.  Is there a difference I am missing that can 
>> account for the huge disparity?  Obviously the "zip" function makes the 
>> "as array" and "array" calls slower, but the same arguments (xCoords, 
>> yCoords) are being passed to the methods... so if there is no difference 
>> in the outputs (there doesn't appear to be) then what reason would I 
>> have to use "array" or "as array" in this context?  Thanks so much ahead 
>> of time..
>> 
>
> If you really want speed, use something like this:
>
> import numpy as np
> def useEmpty(xCoords, yCoords):
>  out = np.empty((len(xCoords), 2), dtype=xCoords.dtype)
>  out[:,0] = xCoords
>  out[:,1] = yCoords
>  return out
>
> It is quite a bit faster than using c_; more than a factor of two on my 
> machine for all your test cases.
>
> All your methods using zip and array are doing a lot of unpacking, 
> repacking, checking, iterating... Even the c_ method is slower than it 
> needs to be for this case because it is more general and flexible.
>
> Eric
>   
>>  
>>
>> MJ
>>
>>  
>>
>> ## Snippet ###
>>
>> import numpy as NUM
>>
>>  
>>
>> def useAsArray(xCoords, yCoords):
>>
>> return NUM.asarray(zip(xCoords, yCoords))
>>
>>  
>>
>> def useArray(xCoords, yCoords):
>>
>> return NUM.array(zip(xCoords, yCoords))
>>
>>  
>>
>> def useC(xCoords, yCoords):
>>
>> return NUM.c_[xCoords, yCoords]
>>
>>  
>>
>>  
>>
>> if __name__ == "__main__":
>>
>> from timeit import Timer
>>
>> import numpy.random as RAND
>>
>> import collections as COLL
>>
>>  
>>
>> resAsArray = COLL.defaultdict(float)
>>
>> resArray = COLL.defaultdict(float)
>>
>> resMat = COLL.defaultdict(float)
>>
>> numTests = 0.0
>>
>> sameTests = 0.0
>>
>> N = [100, 200, 400, 800, 1600, 3200, 6400, 12800]
>>
>> for i in N:
>>
>> print "Time Join List into Array for N = " + str(i)
>>
>> xCoords = RAND.normal(10, 1, i)
>>
>> yCoords = RAND.normal(10, 1, i)
>>
>>  
>>
>> statement = 'from __main__ import xCoords, yCoords, useAsArray'
>>
>> t1 = Timer('useAsArray(xCoords, yCoords)', statement)
>>
>> resAsArray[i] = t1.timeit(10)
>>
>>  
>>
>> statement = 'from __main__ import xCoords, yCoords, useArray'
>>
>> t2 = Timer('useArray(xCoords, yCoords)', statement)
>>
>> resArray[i] = t2.timeit(10)
>>
>>  
>>
>> statement = 'from __main__ import xCoords, yCoords, useC'
>>
>> t3 = Timer('useC(xCoords, yCoords)', statement)
>>
>> resMat[i] = t3.timeit(10)  
>>
>>  
>>
>> for n in N:
>>
>> print "%i, %0.4f, %0.4f, %0.4f" % (n, resAsArray[n], 
>> resArray[n], resMat[n])
>>
>> ###
>>
>>  
>>
>> RESULT
>>
>>  
>>
>> N, useAsArray, useArray, useC
>>
>> 100, 0.0066, 0.0065, 0.0007
>>
>> 200, 0.0137, 0.0140, 0.0008
>>
>> 400, 0.0277, 0.0288, 0.0007
>>
>> 800, 0.0579, 0.0577, 0.0008
>>
>> 1600, 0.1175, 0.1289, 0.0009
>>
>> 3200, 0.2291, 0.2309, 0.0012
>>
>> 6400, 0.4561, 0.4564, 0.0013
>>
>> 12800, 0.9218, 0.9122, 0.0019
>>
>>  
>>
>>  
>>
>> Mark Janikas
>>
>> Product Engineer
>>
>> ESRI, Geoprocessing
>>
>> 380 New York St.
>>
>> Redlands, CA 92373
>>
>> 909-793-2853 (2563)
>>
>> mjani...@esri.com 
>>
>>
>> 
>>
>> ___
>> Numpy-discussion mailing list
>> Numpy-discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> 
Hi,
You can also use column_stack (due to the desired result) as in:
numpy.column_stack((xCoords, yCoords))
numpy.concatenate() is more general.

While not as fast as using numpy.empty(), it does provide a more 
readable and flexible syntax (for example, you do not have to know in 
advance how many columns).

Bruce

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Timing array construction

2009-04-30 Thread Christopher Barker
Mark Janikas wrote:
> I have a lot of array constructions in my code that use
> NUM.array([list of values])... I am going to replace it with the
> empty allocation and insertion.

It may not be worth it, depending on where list_of_values comes from/is. 
A rule of thumb may be: it's going to be slow going from a numpy array 
to a regular old python list or tuple, back to a numpy array. If your 
data is a python list already, than np.array(list) is a fine choice.


>> def useAsArray(xCoords, yCoords):
>>
>> return NUM.asarray(zip(xCoords, yCoords))

Here are some of the issues with this one:

zip unpacks two generic python sequences and then put the items into 
tuple, then puts them in a list. Essentially this:

new_list = []
for i in range(len(xCoords)):
 new_list.append((xCoords[i], yCoords[i]))


In each iteration of that loop, it's indexing into the numpy arrays, 
making a python object out of them, putting them into a tuple, and 
appending that tuple to the list, which may have to re-allocate memory a 
few times.

Then the np.array() call loops through that list, unpacks each tuple, 
examines the python object, decides what it is, and turn it into a raw 
c-type to put into the array.

whereas:

def useEmpty(xCoords, yCoords):
  out = np.empty((len(xCoords), 2), dtype=xCoords.dtype)
  out[:,0] = xCoords
  out[:,1] = yCoords
  return out

allocates an array the right size.
directly copies the data from xCoords and yCoords to it.

that's it.

You can see why it's so much faster!

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Timing array construction

2009-04-30 Thread Mark Janikas
Thanks Eric!

I have a lot of array constructions in my code that use NUM.array([list of 
values])... I am going to replace it with the empty allocation and insertion.  
It is indeed twice as fast as "c_" (when it matters, I.e. N is relatively 
large):

"c_", "empty"
100 0.0007, 0.0230
200 0.0007, 0.0002
400 0.0007, 0.0002
800 0.0020, 0.0002
1600 0.0009, 0.0003
3200 0.0010, 0.0003
6400 0.0013, 0.0005
12800 0.0058, 0.0032

-Original Message-
From: numpy-discussion-boun...@scipy.org 
[mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Eric Firing
Sent: Wednesday, April 29, 2009 11:49 PM
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] Timing array construction

Mark Janikas wrote:
> Hello All,
> 
>  
> 
> I was exploring some different ways to concatenate arrays, and using 
> "c_" is the fastest by far.  Is there a difference I am missing that can 
> account for the huge disparity?  Obviously the "zip" function makes the 
> "as array" and "array" calls slower, but the same arguments (xCoords, 
> yCoords) are being passed to the methods... so if there is no difference 
> in the outputs (there doesn't appear to be) then what reason would I 
> have to use "array" or "as array" in this context?  Thanks so much ahead 
> of time..

If you really want speed, use something like this:

import numpy as np
def useEmpty(xCoords, yCoords):
 out = np.empty((len(xCoords), 2), dtype=xCoords.dtype)
 out[:,0] = xCoords
 out[:,1] = yCoords
 return out

It is quite a bit faster than using c_; more than a factor of two on my 
machine for all your test cases.

All your methods using zip and array are doing a lot of unpacking, 
repacking, checking, iterating... Even the c_ method is slower than it 
needs to be for this case because it is more general and flexible.

Eric
> 
>  
> 
> MJ
> 
>  
> 
> ## Snippet ###
> 
> import numpy as NUM
> 
>  
> 
> def useAsArray(xCoords, yCoords):
> 
> return NUM.asarray(zip(xCoords, yCoords))
> 
>  
> 
> def useArray(xCoords, yCoords):
> 
> return NUM.array(zip(xCoords, yCoords))
> 
>  
> 
> def useC(xCoords, yCoords):
> 
> return NUM.c_[xCoords, yCoords]
> 
>  
> 
>  
> 
> if __name__ == "__main__":
> 
> from timeit import Timer
> 
> import numpy.random as RAND
> 
> import collections as COLL
> 
>  
> 
> resAsArray = COLL.defaultdict(float)
> 
> resArray = COLL.defaultdict(float)
> 
> resMat = COLL.defaultdict(float)
> 
> numTests = 0.0
> 
> sameTests = 0.0
> 
> N = [100, 200, 400, 800, 1600, 3200, 6400, 12800]
> 
> for i in N:
> 
> print "Time Join List into Array for N = " + str(i)
> 
> xCoords = RAND.normal(10, 1, i)
> 
> yCoords = RAND.normal(10, 1, i)
> 
>  
> 
> statement = 'from __main__ import xCoords, yCoords, useAsArray'
> 
> t1 = Timer('useAsArray(xCoords, yCoords)', statement)
> 
> resAsArray[i] = t1.timeit(10)
> 
>  
> 
> statement = 'from __main__ import xCoords, yCoords, useArray'
> 
> t2 = Timer('useArray(xCoords, yCoords)', statement)
> 
> resArray[i] = t2.timeit(10)
> 
>  
> 
> statement = 'from __main__ import xCoords, yCoords, useC'
> 
> t3 = Timer('useC(xCoords, yCoords)', statement)
> 
> resMat[i] = t3.timeit(10)  
> 
>  
> 
> for n in N:
> 
> print "%i, %0.4f, %0.4f, %0.4f" % (n, resAsArray[n], 
> resArray[n], resMat[n])
> 
> ###
> 
>  
> 
> RESULT
> 
>  
> 
> N, useAsArray, useArray, useC
> 
> 100, 0.0066, 0.0065, 0.0007
> 
> 200, 0.0137, 0.0140, 0.0008
> 
> 400, 0.0277, 0.0288, 0.0007
> 
> 800, 0.0579, 0.0577, 0.0008
> 
> 1600, 0.1175, 0.1289, 0.0009
> 
> 3200, 0.2291, 0.2309, 0.0012
> 
> 6400, 0.4561, 0.4564, 0.0013
> 
> 12800, 0.9218, 0.9122, 0.0019
> 
>  
> 
>  
> 
> Mark Janikas
> 
> Product Engineer
> 
> ESRI, Geoprocessing
> 
> 380 New York St.
> 
> Redlands, CA 92373
> 
> 909-793-2853 (2563)
> 
> mjani...@esri.com 
> 
> 
> 
> 
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Splitting multiarray and umath into smaller files: ready for review

2009-04-30 Thread David Cournapeau
2009/4/28 Charles R Harris :
>
>
> I think some of the fixes for too big arrays should be backported to 1.3.x
> before this is merged.
> That's r6851 and r6853. I'll do that.

Ok, I put the changes in the trunk. I will add some documentation as well,

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion