[Numpy-discussion] 2d binning and linear regression

2010-06-20 Thread Tom Durrant
Hi All,

I have a problem involving lat/lon data.  Basically, I am evaluating
numerical weather model data against satellite data, and trying to produce
gridded plots of various statistics.  There are various steps involved with
this, but basically, I get to the point where I have four arrays of the same
length, containing lat, lon, model and observed values respectively along
the path of the sattelite.

eg:

lat   =  [   50.32   50.7851.2451.82   52.5553.1553.75
 54.28   54.79   55.16  ... ]
lon  =  [ 192.83  193.38  193.94  194.67  195.65  196.49  197.35  198.15
 198.94 199.53  ... ]
obs =  [1.42  1.35  1.55 1.50  1.59 1.76
2.15   1.90 1.550.73  ... ]
model  = [ 1.67  1.68 1.70  1.79 2.04 2.36  2.53
 2.38  2.149   1.57 ... ]

I then want to calculate statistics based on bins of say 2 X 2 degree boxes.
These arrays are very large, on the order of 10^6. For ease of explanation,
I will focus initially on bias.

My first approach was to use loops through lat/lon boxes and use np.where
statements to extract all the values of the model and observations within
the given box, and calculate the bias as the mean of the difference.  This
was obviously very slow.

histogram2d provided a FAR better way to do this.  i.e.


import numpy as np

latedges=np.arange(-90,90,2)
lonedges=np.arange(0,360,2)

diff = model-obs
grid_N, xedges, yedges = np.histogram2d(lat, lon],
bins=(latedges,lonedges))
grid_bias_sum, xedges, yedges = np.histogram2d(lat, lon, weights=diff,
bins=(latedges,lonedges))
grid_bias = grid_bias_sum/grid_N


I now want to determine the the linear regression coefficients for each each
box after fitting a least squares linear regression to the data in each bin.
 I have been looking at using np.digitize to extract the bin indeces, but
haven't had much success trying to do this in two dimensions.  I am back to
looping through the lat and lon box values, using np.where to extract the
observations and model data within that box, and using np.polyfit to obtain
the regression coefficients.  This is, of course, very slow.

Can anyone advise a smarter, vectorized way to do this?  Any thoughts would
be greatly appreciated.

Thanks in advance
Tom
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 2d binning and linear regression

2010-06-20 Thread josef . pktd
On Sun, Jun 20, 2010 at 4:24 AM, Tom Durrant thdurr...@gmail.com wrote:
 Hi All,
 I have a problem involving lat/lon data.  Basically, I am evaluating
 numerical weather model data against satellite data, and trying to produce
 gridded plots of various statistics.  There are various steps involved with
 this, but basically, I get to the point where I have four arrays of the same
 length, containing lat, lon, model and observed values respectively along
 the path of the sattelite.
 eg:
 lat       =  [   50.32   50.78    51.24    51.82   52.55    53.15    53.75
  54.28   54.79   55.16  ... ]
 lon      =  [ 192.83  193.38  193.94  194.67  195.65  196.49  197.35  198.15
  198.94 199.53  ... ]
 obs     =  [    1.42      1.35      1.55     1.50      1.59     1.76
 2.15       1.90     1.55    0.73  ... ]
 model  = [     1.67      1.68     1.70      1.79     2.04     2.36      2.53
      2.38      2.149   1.57 ... ]
 I then want to calculate statistics based on bins of say 2 X 2 degree boxes.
 These arrays are very large, on the order of 10^6. For ease of explanation,
 I will focus initially on bias.
 My first approach was to use loops through lat/lon boxes and use np.where
 statements to extract all the values of the model and observations within
 the given box, and calculate the bias as the mean of the difference.  This
 was obviously very slow.
 histogram2d provided a FAR better way to do this.  i.e.

 import numpy as np
 latedges=np.arange(-90,90,2)
 lonedges=np.arange(0,360,2)
 diff = model-obs
 grid_N, xedges, yedges = np.histogram2d(lat, lon],
                 bins=(latedges,lonedges))
 grid_bias_sum, xedges, yedges = np.histogram2d(lat, lon, weights=diff,
                 bins=(latedges,lonedges))
 grid_bias = grid_bias_sum/grid_N

 I now want to determine the the linear regression coefficients for each each
 box after fitting a least squares linear regression to the data in each bin.
  I have been looking at using np.digitize to extract the bin indeces, but
 haven't had much success trying to do this in two dimensions.  I am back to
 looping through the lat and lon box values, using np.where to extract the
 observations and model data within that box, and using np.polyfit to obtain
 the regression coefficients.  This is, of course, very slow.
 Can anyone advise a smarter, vectorized way to do this?  Any thoughts would
 be greatly appreciated.

For a general linear regression problem, there wouldn't be much that I
can see that can be done.

If there are many small regression problem, then sometimes stacking
them into one big sparse least squares problem can be faster, it's
faster to solve but not always faster to create in the first place.

are you doing something like np.polyfit(model, obs, 1) ?

If you are using polyfit with deg=1, i.e. fitting a straight line,
then this could be also calculated using the weights in histogram2d.

histogram2d (histogramdd) uses np.digitize and np.bincount, so I'm
surprised if the histogram2d version is much faster. If a quick
reading of histogramdd is correct, the main improvement would be to
get the labels xy out of it, so it can be used repeatedly with
np.bincount.

Josef


 Thanks in advance
 Tom




 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Multiplying numpy floats and python lists

2010-06-20 Thread Tony S Yu
I came across some strange behavior when multiplying numpy floats and python 
lists: the list is returned unchanged:

 In [18]: np.float64(1.2) * [1, 2]
 
 Out[18]: [1, 2]

On the other hand, multiplying an array scalar and a python list gives the 
expected answer

 In [19]: np.array(1.2) * [1, 2]
 
 Out[19]: array([ 1.2,  2.4])

Finally, multiplying a python float and a python list gives an error:

 In [20]: 1.2 * [1, 2]
 ---
 TypeError Traceback (most recent call last)
 
 /Users/Tony/ipython console in module()
 
 TypeError: can't multiply sequence by non-int of type 'float'

These last two results (with the array scalar and the python float) both seem 
appropriate, but the numpy-float behavior is surprising. Is this a bug?

-Tony
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Multiplying numpy floats and python lists

2010-06-20 Thread Warren Weckesser
Tony S Yu wrote:
 I came across some strange behavior when multiplying numpy floats and python 
 lists: the list is returned unchanged:

   
 In [18]: np.float64(1.2) * [1, 2]

 Out[18]: [1, 2]
 

   

Apparently the np.float64 object is cast to an int, and python's * is used:

In [7]: np.float64(2.5) * [1,2]
Out[7]: [1, 2, 1, 2]

In [8]: np.float64(3.8) * [1,2]
Out[8]: [1, 2, 1, 2, 1, 2]


Warren


 On the other hand, multiplying an array scalar and a python list gives the 
 expected answer

   
 In [19]: np.array(1.2) * [1, 2]

 Out[19]: array([ 1.2,  2.4])
 

 Finally, multiplying a python float and a python list gives an error:

   
 In [20]: 1.2 * [1, 2]
 ---
 TypeError Traceback (most recent call last)

 /Users/Tony/ipython console in module()

 TypeError: can't multiply sequence by non-int of type 'float'
 

 These last two results (with the array scalar and the python float) both seem 
 appropriate, but the numpy-float behavior is surprising. Is this a bug?

 -Tony
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
   

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Multiplying numpy floats and python lists

2010-06-20 Thread Pauli Virtanen
su, 2010-06-20 kello 13:56 -0400, Tony S Yu kirjoitti:
 I came across some strange behavior when multiplying numpy floats and
 python lists: the list is returned unchanged:
 
  In [18]: np.float64(1.2) * [1, 2]
  
  Out[18]: [1, 2]

Probably a bug, it seems to round the result first to an integer, and
then do the usual Python thing (try for example np.float64(2.2)).

I'd suggest filing a bug ticket.

Looking at CPython source code, the issue seems to be that np.float64
for some reason passes PyIndex_Check.

But (without whipping out gdb) I can't understand how this can be: the
float64 scalar type is not supposed to have a non-NULL in the nb_index
slot. Indeed, a np.float64.__index__ method does not exist, and the
following indeed does not work:

[1, 2, 3][np.float64(2.2)]

Strange.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] _wrapit bug?

2010-06-20 Thread Benjamin Root
Hi, I was just digging through the fromnumeric.py code when I noticed that
the function _wrapit used asarray() rather than asanyarray().  I don't know
if this function really gets used anymore (as it gets called in special
situations), but would seem to be an issue as it would cause  arrays such as
masked_array to lose its type.  For example, if _wrapit was called for a
max() function on a masked array, the max returned would be for the
ndarray's max, not the masked_array's max.

Let me know if this is a non-issue, I just thought I ought to point it out.

Thanks,
Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Multiplying numpy floats and python lists

2010-06-20 Thread Tony S Yu

On Jun 20, 2010, at 2:28 PM, Pauli Virtanen wrote:

 su, 2010-06-20 kello 13:56 -0400, Tony S Yu kirjoitti:
 I came across some strange behavior when multiplying numpy floats and
 python lists: the list is returned unchanged:
 
 In [18]: np.float64(1.2) * [1, 2]
 
 Out[18]: [1, 2]
 
 Probably a bug, it seems to round the result first to an integer, and
 then do the usual Python thing (try for example np.float64(2.2)).
 
 I'd suggest filing a bug ticket.

Done: http://projects.scipy.org/numpy/ticket/1516

 
 Looking at CPython source code, the issue seems to be that np.float64
 for some reason passes PyIndex_Check.
 
 But (without whipping out gdb) I can't understand how this can be: the
 float64 scalar type is not supposed to have a non-NULL in the nb_index
 slot. Indeed, a np.float64.__index__ method does not exist, and the
 following indeed does not work:
 
   [1, 2, 3][np.float64(2.2)]
 
 Strange.
 
 -- 
 Pauli Virtanen
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Building Numpy: could not read symbols Bad value error

2010-06-20 Thread Michael Green
On Sun, Jun 20, 2010 at 1:19 AM, Charles R Harris
charlesr.har...@gmail.com wrote:

 Have you checked the actual configuration that was used to compile lapack,
 *not* the command line flags. I never had with success with the automatic
 builds using the compressed files. Also check the actual installed libraries

 snip

Yeah, the make.inc does have the -fPIC option:

FORTRAN  = gfortran
OPTS = -O2 -fPIC
DRVOPTS  = $(OPTS)
NOOPT= -O0 -fPIC
LOADER   = gfortran
LOADOPTS =

And of course I see that -fPIC is actually being used:
...
gfortran  -O2 -fPIC -c zsysvx.f -o zsysvx.o
gfortran  -O2 -fPIC -c zsytf2.f -o zsytf2.o
gfortran  -O2 -fPIC -c zsytrf.f -o zsytrf.o
...
--
Warm regards,
Michael Green
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Compiling NumPy on 64 bit Solaris 10 sparc)

2010-06-20 Thread Chris LeBlanc
Hi Ralf,

Thanks for the information.  I'll have a look at it, and see if it
solves the issue.

Cheers,
Chris

On 18 June 2010 01:53, Ralf Gommers ralf.gomm...@googlemail.com wrote:


 On Tue, Jun 15, 2010 at 11:06 AM, Chris LeBlanc crlebl...@gmail.com wrote:

 Hi,

 Firstly thanks to everyone that has helped bring NumPy to the point it
 is today.  Its an excellent bit of software.

 I've recently managed to get NumPy to compile on a 64 bit Solaris 10
 (sparc) machine.  We've embedded 64 bit Python in some of our C code
 using the Python C-API, and therefore require a 64 bit version of
 NumPy.

 I had to use a bit of a hack to get NumPy to compile in 64 bit mode,
 and I'm just wondering if there's a more elegant way of doing it?

 If I set CFLAGS to -m64 -xcode=pic32, and CC to /usr/bin/cc, it
 will raise the following error:

 ... snipped most of backtrace ...
  File /home/leblanc/source/numpy/numpy/distutils/command/build_src.py,
 line 385, in generate_sources
    source = func(extension, build_dir)
  File numpy/core/setup.py, line 675, in get_mathlib_info
    raise RuntimeError(Broken toolchain: cannot link a simple C program)

 However, if I create a wrapper script and set CC to the path of the
 wrapper, then it will build normally and create a NumPy module I can
 use from the 64 bit version of Python installed on the system
 (/usr/bin/sparcv9/python).  Here's the wrapper:

 #!/bin/sh
 /usr/bin/cc -m64 -xcode=pic32 $*

 I did a bit of googling and couldn't find much information on this
 issue (which is why I wanted to post this message, so others could
 find it).  I'm satisfied with the results, but wondering if there's a
 better way to do it.

 It's because CFLAGS does not append to but overrides flags. Your wrapper
 script manages to append flags.

 Does the patch in http://projects.scipy.org/numpy/ticket/1382 work for you?
 If so it can be applied I think.

 Cheers,
 Ralf

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] timeseries - dates prior to 1970

2010-06-20 Thread Bevan Jenkins
Pierre GM pgmdevlist at gmail.com writes:

 
 On Jun 10, 2010, at 7:16 PM, Bevan Jenkins wrote:
  Hello,
  
  I have posted previously about dates prior to 1900 but this seems to be a 
  seperate issue.  The error message is definitley different.
  I can not seem to convert a timseseries from one frequency ('D') to 
another 
  ('H') when i use dates prior to 1970 as shown below.  This works fine when 
I 
  use a date after 1970.  Is this something that can be easily fixed or work 
  around that I can use? Thanks
 
 I've started a git branch (http://github.com/pierregm/scikits.timeseries-
sandbox) to test some
 changes on scikits.timeseries, in particular improved support of high-
frequency dates before the unix
 epoch. Please give it a try, let me know how it goes. If all's fine, I'll 
port the changes to the SVN site
 (which until further notice remains the official source for the scikits).
 

I downloaded the git branch, I had to use the Download 
pierregm/scikits.timeseries-sandbox at master via a zip file as I have not 
managed to get Git to work at Uni yet.
After deleting the old scikit I built and installed the new version but I am 
still getting the same error.
I notoiced you had some new tests to dealing with high- freq dates before the 
unix epoch.  When i run
ts.test() i get
Ran 0 tests in 0.000s

OK

So I imagine I am doing something incorrectly. 
 




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] timeseries - dates prior to 1970

2010-06-20 Thread Pierre GM

On Jun 20, 2010, at 7:28 PM, Bevan Jenkins wrote:
 
 I downloaded the git branch, I had to use the Download 
 pierregm/scikits.timeseries-sandbox at master via a zip file as I have not 
 managed to get Git to work at Uni yet.
 After deleting the old scikit I built and installed the new version but I am 
 still getting the same error.

Note that it's a work in progress. I'll let you know when the problem you 
experienced is fixed.

 I notoiced you had some new tests to dealing with high- freq dates before the 
 unix epoch.  When i run
 ts.test() i get
 Ran 0 tests in 0.000s
 
 OK
 
 So I imagine I am doing something incorrectly. 

Not necessarily. It works on my side (but I install using python setup.py 
develop...), I'll try and see where it goes wrong.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Building Numpy: could not read symbols Bad value error

2010-06-20 Thread David
On 06/21/2010 06:30 AM, Michael Green wrote:
 On Sun, Jun 20, 2010 at 1:19 AM, Charles R Harris
 charlesr.har...@gmail.com  wrote:

 Have you checked the actual configuration that was used to compile lapack,
 *not* the command line flags. I never had with success with the automatic
 builds using the compressed files. Also check the actual installed libraries

 snip

 Yeah, the make.inc does have the -fPIC option:

 FORTRAN  = gfortran
 OPTS = -O2 -fPIC
 DRVOPTS  = $(OPTS)
 NOOPT= -O0 -fPIC
 LOADER   = gfortran
 LOADOPTS =

 And of course I see that -fPIC is actually being used:
 ...
 gfortran  -O2 -fPIC -c zsysvx.f -o zsysvx.o
 gfortran  -O2 -fPIC -c zsytf2.f -o zsytf2.o
 gfortran  -O2 -fPIC -c zsytrf.f -o zsytrf.o

Can you check it is actually used for the file which causes the link 
failure (in atlas) ?

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 2d binning and linear regression

2010-06-20 Thread Tom Durrant

 
 are you doing something like np.polyfit(model, obs, 1) ?
 
 If you are using polyfit with deg=1, i.e. fitting a straight line,
 then this could be also calculated using the weights in histogram2d.
 
 histogram2d (histogramdd) uses np.digitize and np.bincount, so I'm
 surprised if the histogram2d version is much faster. If a quick
 reading of histogramdd is correct, the main improvement would be to
 get the labels xy out of it, so it can be used repeatedly with
 np.bincount.
 
 Josef
 
Thanks Josef, 

From my limited understanding, you are right the histogram is much faster due 
to 
the fact that it doesn't have to keep reading in the array over and over

I am using np.polyfit(model, obs, 1).  I couldn't work out a way to do these 
regression using histogram2d and weights, but you think it can be done?  This 
would be great!

Cheers
Tom
 




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion