Re: [Numpy-discussion] 2D binning

2010-06-02 Thread Vincent Schut
On 06/02/2010 04:52 AM, josef.p...@gmail.com wrote:
 On Tue, Jun 1, 2010 at 9:57 PM, Zachary Pincuszachary.pin...@yale.edu  
 wrote:
 I guess it's as fast as I'm going to get. I don't really see any
 other way. BTW, the lat/lons are integers)

 You could (in c or cython) try a brain-dead hashtable with no
 collision detection:

 for lat, long, data in dataset:
bin = (lat ^ long) % num_bins
hashtable[bin] = update_incremental_mean(hashtable[bin], data)

 you'll of course want to do some experiments to see if your data are
 sufficiently sparse and/or you can afford a large enough hashtable
 array that you won't get spurious hash collisions. Adding error-
 checking to ensure that there are no collisions would be pretty
 trivial (just keep a table of the lat/long for each hash value, which
 you'll need anyway, and check that different lat/long pairs don't get
 assigned the same bin).

 Zach



 -Mathew

 On Tue, Jun 1, 2010 at 1:49 PM, Zachary Pincuszachary.pin...@yale.edu
 wrote:
 Hi
 Can anyone think of a clever (non-lopping) solution to the
 following?

 A have a list of latitudes, a list of longitudes, and list of data
 values. All lists are the same length.

 I want to compute an average  of data values for each lat/lon pair.
 e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then
 data[1001] = (data[1001] + data[2001])/2

 Looping is going to take wa to long.

 As a start, are the equal lat/lon pairs exactly equal (i.e. either
 not floating-point, or floats that will always compare equal, that is,
 the floating-point bit-patterns will be guaranteed to be identical) or
 approximately equal to float tolerance?

 If you're in the approx-equal case, then look at the KD-tree in scipy
 for doing near-neighbors queries.

 If you're in the exact-equal case, you could consider hashing the lat/
 lon pairs or something. At least then the looping is O(N) and not
 O(N^2):

 import collections
 grouped = collections.defaultdict(list)
 for lt, ln, da in zip(lat, lon, data):
grouped[(lt, ln)].append(da)

 averaged = dict((ltln, numpy.mean(da)) for ltln, da in
 grouped.items())

 Is that fast enough?

 If the lat lon can be converted to a 1d label as Wes suggested, then
 in a similar timing exercise ndimage was the fastest.
 http://mail.scipy.org/pipermail/scipy-user/2009-February/019850.html

And as you said your lats and lons are integers, you could simply do

ll = lat*1000 + lon

to get unique 'hashes' or '1d labels' for you latlon pairs, as a lat or 
lon will never exceed 360 (degrees).

After that, either use the ndimage approach, or you could use 
histogramming with weighting by data values and divide by histogram 
withouth weighting, or just loop.

Vincent


 (this was for python 2.4, also later I found np.bincount which
 requires that the labels are consecutive integers, but is as fast as
 ndimage)

 I don't know how it would compare to the new suggestions.

 Josef




 Zach
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Technicalities of the SVN - GIT transition

2010-06-02 Thread David Cournapeau
On Wed, Jun 2, 2010 at 3:09 AM, Matthew Brett matthew.br...@gmail.com wrote:

 Maybe you could put up a list of those people whose emails you need to
 scrape (obviously without the email addresses) and ask for opt-out?
 Or opt-in if you think that's better?

So here is the list of authors:

abaecker
alan.mcintyre
ariver
cdavid
chanley
charris
chris.barker
chris.burns
christoph.weidemann
cookedm
darren.dale
dhuard
dmorrill
edschofield
ej
eric
fperez
ilan
jarrod.millman
jmiller
joe
jswhit
kern
madhusudancs
martin
matthew.brett
mdroe
nobody
oliphant
patmiller
paul.ivanov
pearu
peridot
pierregm
prabhu
ptvirtan
rc
rgommers
rkern
sasha
skip
stefan
test
tim_hochberg
timl
travis
travo
wfspotz

(for wfspotz and matthew.brett, I removed the email suffix). I guess
rkern and kern are the same person, as well as travo/oliphant.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 2D binning

2010-06-02 Thread Benjamin Root
Why not simply use a set?

uniquePoints = set(zip(lats, lons))

Ben Root

On Wed, Jun 2, 2010 at 2:41 AM, Vincent Schut sc...@sarvision.nl wrote:

 On 06/02/2010 04:52 AM, josef.p...@gmail.com wrote:
  On Tue, Jun 1, 2010 at 9:57 PM, Zachary Pincuszachary.pin...@yale.edu
  wrote:
  I guess it's as fast as I'm going to get. I don't really see any
  other way. BTW, the lat/lons are integers)
 
  You could (in c or cython) try a brain-dead hashtable with no
  collision detection:
 
  for lat, long, data in dataset:
 bin = (lat ^ long) % num_bins
 hashtable[bin] = update_incremental_mean(hashtable[bin], data)
 
  you'll of course want to do some experiments to see if your data are
  sufficiently sparse and/or you can afford a large enough hashtable
  array that you won't get spurious hash collisions. Adding error-
  checking to ensure that there are no collisions would be pretty
  trivial (just keep a table of the lat/long for each hash value, which
  you'll need anyway, and check that different lat/long pairs don't get
  assigned the same bin).
 
  Zach
 
 
 
  -Mathew
 
  On Tue, Jun 1, 2010 at 1:49 PM, Zachary Pincuszachary.pin...@yale.edu
  wrote:
  Hi
  Can anyone think of a clever (non-lopping) solution to the
  following?
 
  A have a list of latitudes, a list of longitudes, and list of data
  values. All lists are the same length.
 
  I want to compute an average  of data values for each lat/lon pair.
  e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then
  data[1001] = (data[1001] + data[2001])/2
 
  Looping is going to take wa to long.
 
  As a start, are the equal lat/lon pairs exactly equal (i.e. either
  not floating-point, or floats that will always compare equal, that is,
  the floating-point bit-patterns will be guaranteed to be identical) or
  approximately equal to float tolerance?
 
  If you're in the approx-equal case, then look at the KD-tree in scipy
  for doing near-neighbors queries.
 
  If you're in the exact-equal case, you could consider hashing the lat/
  lon pairs or something. At least then the looping is O(N) and not
  O(N^2):
 
  import collections
  grouped = collections.defaultdict(list)
  for lt, ln, da in zip(lat, lon, data):
 grouped[(lt, ln)].append(da)
 
  averaged = dict((ltln, numpy.mean(da)) for ltln, da in
  grouped.items())
 
  Is that fast enough?
 
  If the lat lon can be converted to a 1d label as Wes suggested, then
  in a similar timing exercise ndimage was the fastest.
  http://mail.scipy.org/pipermail/scipy-user/2009-February/019850.html

 And as you said your lats and lons are integers, you could simply do

 ll = lat*1000 + lon

 to get unique 'hashes' or '1d labels' for you latlon pairs, as a lat or
 lon will never exceed 360 (degrees).

 After that, either use the ndimage approach, or you could use
 histogramming with weighting by data values and divide by histogram
 withouth weighting, or just loop.

 Vincent

 
  (this was for python 2.4, also later I found np.bincount which
  requires that the labels are consecutive integers, but is as fast as
  ndimage)
 
  I don't know how it would compare to the new suggestions.
 
  Josef
 
 
 
 
  Zach
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] 2d binning on regular grid

2010-06-02 Thread Andreas Hilboll
Hi there,

I'm interested in the solution to a special case of the parallel thread
'2D binning', which is going on at the moment. My data is on a fine global
grid, say .125x.125 degrees. I'm looking for a way to do calculations on
coarser grids, e.g.

* calculate means()
* calculate std()
* ...

on a, say, 2.5x3.75 degree grid. One very crude approach would be to
iterate through latitudes and longitudes, like this:


latstep_orig = .125
lonstep_orig = .125
data_orig =
np.arange((180./latstep_orig)*(360./lonstep_orig)).reshape((180./latstep_orig,360./lonstep_orig))
latstep_new = 2.5
lonstep_new = 3.75
latstep = int(latstep_new / latstep_orig)
lonstep = int(lonstep_new / lonstep_orig)

print 'one new lat equals',latstep,'new lats'
print 'one new lon equals',lonstep,'new lons'

result = ma.zeros((180./latstep_new,360./lonstep_new))
latidx = 0
while latidx*latstep_new  180.:
lonidx = 0
while lonidx*lonstep_new  360.:
m = np.mean( \
data_orig[latidx*latstep:(latidx+1)*latstep, \
lonidx*lonstep:(lonidx+1)*lonstep])
result[latidx,lonidx] = m
lonidx += 1
latidx += 1

However, this is very crude, and I was wondering if there's any more
elegant way to do it ...

Thanks for your insight!

A.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 2D binning

2010-06-02 Thread Wes McKinney
On Wed, Jun 2, 2010 at 3:41 AM, Vincent Schut sc...@sarvision.nl wrote:
 On 06/02/2010 04:52 AM, josef.p...@gmail.com wrote:
 On Tue, Jun 1, 2010 at 9:57 PM, Zachary Pincuszachary.pin...@yale.edu  
 wrote:
 I guess it's as fast as I'm going to get. I don't really see any
 other way. BTW, the lat/lons are integers)

 You could (in c or cython) try a brain-dead hashtable with no
 collision detection:

 for lat, long, data in dataset:
    bin = (lat ^ long) % num_bins
    hashtable[bin] = update_incremental_mean(hashtable[bin], data)

 you'll of course want to do some experiments to see if your data are
 sufficiently sparse and/or you can afford a large enough hashtable
 array that you won't get spurious hash collisions. Adding error-
 checking to ensure that there are no collisions would be pretty
 trivial (just keep a table of the lat/long for each hash value, which
 you'll need anyway, and check that different lat/long pairs don't get
 assigned the same bin).

 Zach



 -Mathew

 On Tue, Jun 1, 2010 at 1:49 PM, Zachary Pincuszachary.pin...@yale.edu
 wrote:
 Hi
 Can anyone think of a clever (non-lopping) solution to the
 following?

 A have a list of latitudes, a list of longitudes, and list of data
 values. All lists are the same length.

 I want to compute an average  of data values for each lat/lon pair.
 e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then
 data[1001] = (data[1001] + data[2001])/2

 Looping is going to take wa to long.

 As a start, are the equal lat/lon pairs exactly equal (i.e. either
 not floating-point, or floats that will always compare equal, that is,
 the floating-point bit-patterns will be guaranteed to be identical) or
 approximately equal to float tolerance?

 If you're in the approx-equal case, then look at the KD-tree in scipy
 for doing near-neighbors queries.

 If you're in the exact-equal case, you could consider hashing the lat/
 lon pairs or something. At least then the looping is O(N) and not
 O(N^2):

 import collections
 grouped = collections.defaultdict(list)
 for lt, ln, da in zip(lat, lon, data):
    grouped[(lt, ln)].append(da)

 averaged = dict((ltln, numpy.mean(da)) for ltln, da in
 grouped.items())

 Is that fast enough?

 If the lat lon can be converted to a 1d label as Wes suggested, then
 in a similar timing exercise ndimage was the fastest.
 http://mail.scipy.org/pipermail/scipy-user/2009-February/019850.html

 And as you said your lats and lons are integers, you could simply do

 ll = lat*1000 + lon

 to get unique 'hashes' or '1d labels' for you latlon pairs, as a lat or
 lon will never exceed 360 (degrees).

 After that, either use the ndimage approach, or you could use
 histogramming with weighting by data values and divide by histogram
 withouth weighting, or just loop.

 Vincent


 (this was for python 2.4, also later I found np.bincount which
 requires that the labels are consecutive integers, but is as fast as
 ndimage)

 I don't know how it would compare to the new suggestions.

 Josef




 Zach
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


I was curious about how fast ndimage was for this operation so here's
the complete function.

import scipy.ndimage as ndi

N = 1

lat = np.random.randint(0, 360, N)
lon = np.random.randint(0, 360, N)
data = np.random.randn(N)

def group_mean(lat, lon, data):
indexer = np.lexsort((lon, lat))
lat = lat.take(indexer)
lon = lon.take(indexer)
sorted_data = data.take(indexer)

keys = 1000 * lat + lon
unique_keys = np.unique(keys)

result = ndi.mean(sorted_data, labels=keys, index=unique_keys)
decoder = keys.searchsorted(unique_keys)

return dict(zip(zip(lat.take(decoder), lon.take(decoder)), result))

Appears to be about 13x faster (and could be made faster still) than
the naive version on my machine:

def group_mean_naive(lat, lon, data):
grouped = collections.defaultdict(list)
for lt, ln, da in zip(lat, lon, data):
  grouped[(lt, ln)].append(da)

averaged = dict((ltln, np.mean(da)) for ltln, da in grouped.items())

return averaged

I had to get the latest scipy trunk to not get an error from ndimage.mean
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Technicalities of the SVN - GIT transition

2010-06-02 Thread Matthew Brett
Hi.

 Maybe you could put up a list of those people whose emails you need to
 scrape (obviously without the email addresses) and ask for opt-out?
 Or opt-in if you think that's better?

 So here is the list of authors:

Do y'all think opt-in?  Or opt-out?If it's opt-in I guess you'll
catch most of the current committers, and most of the others you'll
lose, but maybe that's good enough.

See you,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 2D binning

2010-06-02 Thread Mathew Yeates
thanks. I am also getting an error in ndi.mean
Were you getting the error
RuntimeError: data type not supported?

-Mathew


On Wed, Jun 2, 2010 at 9:40 AM, Wes McKinney wesmck...@gmail.com wrote:

 On Wed, Jun 2, 2010 at 3:41 AM, Vincent Schut sc...@sarvision.nl wrote:
  On 06/02/2010 04:52 AM, josef.p...@gmail.com wrote:
  On Tue, Jun 1, 2010 at 9:57 PM, Zachary Pincuszachary.pin...@yale.edu
  wrote:
  I guess it's as fast as I'm going to get. I don't really see any
  other way. BTW, the lat/lons are integers)
 
  You could (in c or cython) try a brain-dead hashtable with no
  collision detection:
 
  for lat, long, data in dataset:
 bin = (lat ^ long) % num_bins
 hashtable[bin] = update_incremental_mean(hashtable[bin], data)
 
  you'll of course want to do some experiments to see if your data are
  sufficiently sparse and/or you can afford a large enough hashtable
  array that you won't get spurious hash collisions. Adding error-
  checking to ensure that there are no collisions would be pretty
  trivial (just keep a table of the lat/long for each hash value, which
  you'll need anyway, and check that different lat/long pairs don't get
  assigned the same bin).
 
  Zach
 
 
 
  -Mathew
 
  On Tue, Jun 1, 2010 at 1:49 PM, Zachary Pincus
 zachary.pin...@yale.edu
  wrote:
  Hi
  Can anyone think of a clever (non-lopping) solution to the
  following?
 
  A have a list of latitudes, a list of longitudes, and list of data
  values. All lists are the same length.
 
  I want to compute an average  of data values for each lat/lon pair.
  e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then
  data[1001] = (data[1001] + data[2001])/2
 
  Looping is going to take wa to long.
 
  As a start, are the equal lat/lon pairs exactly equal (i.e. either
  not floating-point, or floats that will always compare equal, that is,
  the floating-point bit-patterns will be guaranteed to be identical) or
  approximately equal to float tolerance?
 
  If you're in the approx-equal case, then look at the KD-tree in scipy
  for doing near-neighbors queries.
 
  If you're in the exact-equal case, you could consider hashing the lat/
  lon pairs or something. At least then the looping is O(N) and not
  O(N^2):
 
  import collections
  grouped = collections.defaultdict(list)
  for lt, ln, da in zip(lat, lon, data):
 grouped[(lt, ln)].append(da)
 
  averaged = dict((ltln, numpy.mean(da)) for ltln, da in
  grouped.items())
 
  Is that fast enough?
 
  If the lat lon can be converted to a 1d label as Wes suggested, then
  in a similar timing exercise ndimage was the fastest.
  http://mail.scipy.org/pipermail/scipy-user/2009-February/019850.html
 
  And as you said your lats and lons are integers, you could simply do
 
  ll = lat*1000 + lon
 
  to get unique 'hashes' or '1d labels' for you latlon pairs, as a lat or
  lon will never exceed 360 (degrees).
 
  After that, either use the ndimage approach, or you could use
  histogramming with weighting by data values and divide by histogram
  withouth weighting, or just loop.
 
  Vincent
 
 
  (this was for python 2.4, also later I found np.bincount which
  requires that the labels are consecutive integers, but is as fast as
  ndimage)
 
  I don't know how it would compare to the new suggestions.
 
  Josef
 
 
 
 
  Zach
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 

 I was curious about how fast ndimage was for this operation so here's
 the complete function.

 import scipy.ndimage as ndi

 N = 1

 lat = np.random.randint(0, 360, N)
 lon = np.random.randint(0, 360, N)
 data = np.random.randn(N)

 def group_mean(lat, lon, data):
indexer = np.lexsort((lon, lat))
lat = lat.take(indexer)
lon = lon.take(indexer)
sorted_data = data.take(indexer)

keys = 1000 * lat + lon
unique_keys = np.unique(keys)

result = ndi.mean(sorted_data, labels=keys, index=unique_keys)
decoder = keys.searchsorted(unique_keys)

return dict(zip(zip(lat.take(decoder), lon.take(decoder)), result))

 Appears to be about 13x faster (and could be made faster still) than
 the naive version on my machine:

 def group_mean_naive(lat, lon, data):
 grouped = collections.defaultdict(list)
for lt, ln, da in zip(lat, lon, data):
  grouped[(lt, ln)].append(da)

 averaged = dict((ltln, np.mean(da)) for ltln, da in grouped.items())


Re: [Numpy-discussion] 2D binning

2010-06-02 Thread Wes McKinney
On Wed, Jun 2, 2010 at 1:23 PM, Mathew Yeates mat.yea...@gmail.com wrote:
 thanks. I am also getting an error in ndi.mean
 Were you getting the error
 RuntimeError: data type not supported?

 -Mathew

 On Wed, Jun 2, 2010 at 9:40 AM, Wes McKinney wesmck...@gmail.com wrote:

 On Wed, Jun 2, 2010 at 3:41 AM, Vincent Schut sc...@sarvision.nl wrote:
  On 06/02/2010 04:52 AM, josef.p...@gmail.com wrote:
  On Tue, Jun 1, 2010 at 9:57 PM, Zachary Pincuszachary.pin...@yale.edu
   wrote:
  I guess it's as fast as I'm going to get. I don't really see any
  other way. BTW, the lat/lons are integers)
 
  You could (in c or cython) try a brain-dead hashtable with no
  collision detection:
 
  for lat, long, data in dataset:
     bin = (lat ^ long) % num_bins
     hashtable[bin] = update_incremental_mean(hashtable[bin], data)
 
  you'll of course want to do some experiments to see if your data are
  sufficiently sparse and/or you can afford a large enough hashtable
  array that you won't get spurious hash collisions. Adding error-
  checking to ensure that there are no collisions would be pretty
  trivial (just keep a table of the lat/long for each hash value, which
  you'll need anyway, and check that different lat/long pairs don't get
  assigned the same bin).
 
  Zach
 
 
 
  -Mathew
 
  On Tue, Jun 1, 2010 at 1:49 PM, Zachary
  Pincuszachary.pin...@yale.edu
  wrote:
  Hi
  Can anyone think of a clever (non-lopping) solution to the
  following?
 
  A have a list of latitudes, a list of longitudes, and list of data
  values. All lists are the same length.
 
  I want to compute an average  of data values for each lat/lon pair.
  e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then
  data[1001] = (data[1001] + data[2001])/2
 
  Looping is going to take wa to long.
 
  As a start, are the equal lat/lon pairs exactly equal (i.e. either
  not floating-point, or floats that will always compare equal, that
  is,
  the floating-point bit-patterns will be guaranteed to be identical)
  or
  approximately equal to float tolerance?
 
  If you're in the approx-equal case, then look at the KD-tree in scipy
  for doing near-neighbors queries.
 
  If you're in the exact-equal case, you could consider hashing the
  lat/
  lon pairs or something. At least then the looping is O(N) and not
  O(N^2):
 
  import collections
  grouped = collections.defaultdict(list)
  for lt, ln, da in zip(lat, lon, data):
     grouped[(lt, ln)].append(da)
 
  averaged = dict((ltln, numpy.mean(da)) for ltln, da in
  grouped.items())
 
  Is that fast enough?
 
  If the lat lon can be converted to a 1d label as Wes suggested, then
  in a similar timing exercise ndimage was the fastest.
  http://mail.scipy.org/pipermail/scipy-user/2009-February/019850.html
 
  And as you said your lats and lons are integers, you could simply do
 
  ll = lat*1000 + lon
 
  to get unique 'hashes' or '1d labels' for you latlon pairs, as a lat or
  lon will never exceed 360 (degrees).
 
  After that, either use the ndimage approach, or you could use
  histogramming with weighting by data values and divide by histogram
  withouth weighting, or just loop.
 
  Vincent
 
 
  (this was for python 2.4, also later I found np.bincount which
  requires that the labels are consecutive integers, but is as fast as
  ndimage)
 
  I don't know how it would compare to the new suggestions.
 
  Josef
 
 
 
 
  Zach
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 

 I was curious about how fast ndimage was for this operation so here's
 the complete function.

 import scipy.ndimage as ndi

 N = 1

 lat = np.random.randint(0, 360, N)
 lon = np.random.randint(0, 360, N)
 data = np.random.randn(N)

 def group_mean(lat, lon, data):
    indexer = np.lexsort((lon, lat))
    lat = lat.take(indexer)
    lon = lon.take(indexer)
    sorted_data = data.take(indexer)

    keys = 1000 * lat + lon
    unique_keys = np.unique(keys)

    result = ndi.mean(sorted_data, labels=keys, index=unique_keys)
    decoder = keys.searchsorted(unique_keys)

    return dict(zip(zip(lat.take(decoder), lon.take(decoder)), result))

 Appears to be about 13x faster (and could be made faster still) than
 the naive version on my machine:

 def group_mean_naive(lat, lon, data):
    grouped = collections.defaultdict(list)
    for lt, ln, da in zip(lat, lon, data):
      grouped[(lt, ln)].append(da)


Re: [Numpy-discussion] 2D binning

2010-06-02 Thread Mathew Yeates
I'm on Windows, using a precompiled binary. I never built numpy/scipy on
Windows.

On Wed, Jun 2, 2010 at 10:45 AM, Wes McKinney wesmck...@gmail.com wrote:

 On Wed, Jun 2, 2010 at 1:23 PM, Mathew Yeates mat.yea...@gmail.com
 wrote:
  thanks. I am also getting an error in ndi.mean
  Were you getting the error
  RuntimeError: data type not supported?
 
  -Mathew
 
  On Wed, Jun 2, 2010 at 9:40 AM, Wes McKinney wesmck...@gmail.com
 wrote:
 
  On Wed, Jun 2, 2010 at 3:41 AM, Vincent Schut sc...@sarvision.nl
 wrote:
   On 06/02/2010 04:52 AM, josef.p...@gmail.com wrote:
   On Tue, Jun 1, 2010 at 9:57 PM, Zachary Pincus
 zachary.pin...@yale.edu
wrote:
   I guess it's as fast as I'm going to get. I don't really see any
   other way. BTW, the lat/lons are integers)
  
   You could (in c or cython) try a brain-dead hashtable with no
   collision detection:
  
   for lat, long, data in dataset:
  bin = (lat ^ long) % num_bins
  hashtable[bin] = update_incremental_mean(hashtable[bin], data)
  
   you'll of course want to do some experiments to see if your data are
   sufficiently sparse and/or you can afford a large enough hashtable
   array that you won't get spurious hash collisions. Adding error-
   checking to ensure that there are no collisions would be pretty
   trivial (just keep a table of the lat/long for each hash value,
 which
   you'll need anyway, and check that different lat/long pairs don't
 get
   assigned the same bin).
  
   Zach
  
  
  
   -Mathew
  
   On Tue, Jun 1, 2010 at 1:49 PM, Zachary
   Pincuszachary.pin...@yale.edu
   wrote:
   Hi
   Can anyone think of a clever (non-lopping) solution to the
   following?
  
   A have a list of latitudes, a list of longitudes, and list of data
   values. All lists are the same length.
  
   I want to compute an average  of data values for each lat/lon
 pair.
   e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then
   data[1001] = (data[1001] + data[2001])/2
  
   Looping is going to take wa to long.
  
   As a start, are the equal lat/lon pairs exactly equal (i.e.
 either
   not floating-point, or floats that will always compare equal, that
   is,
   the floating-point bit-patterns will be guaranteed to be identical)
   or
   approximately equal to float tolerance?
  
   If you're in the approx-equal case, then look at the KD-tree in
 scipy
   for doing near-neighbors queries.
  
   If you're in the exact-equal case, you could consider hashing the
   lat/
   lon pairs or something. At least then the looping is O(N) and not
   O(N^2):
  
   import collections
   grouped = collections.defaultdict(list)
   for lt, ln, da in zip(lat, lon, data):
  grouped[(lt, ln)].append(da)
  
   averaged = dict((ltln, numpy.mean(da)) for ltln, da in
   grouped.items())
  
   Is that fast enough?
  
   If the lat lon can be converted to a 1d label as Wes suggested, then
   in a similar timing exercise ndimage was the fastest.
   http://mail.scipy.org/pipermail/scipy-user/2009-February/019850.html
  
   And as you said your lats and lons are integers, you could simply do
  
   ll = lat*1000 + lon
  
   to get unique 'hashes' or '1d labels' for you latlon pairs, as a lat
 or
   lon will never exceed 360 (degrees).
  
   After that, either use the ndimage approach, or you could use
   histogramming with weighting by data values and divide by histogram
   withouth weighting, or just loop.
  
   Vincent
  
  
   (this was for python 2.4, also later I found np.bincount which
   requires that the labels are consecutive integers, but is as fast as
   ndimage)
  
   I don't know how it would compare to the new suggestions.
  
   Josef
  
  
  
  
   Zach
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
 
  I was curious about how fast ndimage was for this operation so here's
  the complete function.
 
  import scipy.ndimage as ndi
 
  N = 1
 
  lat = np.random.randint(0, 360, N)
  lon = np.random.randint(0, 360, N)
  data = np.random.randn(N)
 
  def group_mean(lat, lon, data):
 indexer = np.lexsort((lon, lat))
 lat = lat.take(indexer)
 lon = lon.take(indexer)
 sorted_data = data.take(indexer)
 
 keys = 1000 * lat + lon
 unique_keys = np.unique(keys)
 
 result = ndi.mean(sorted_data, labels=keys, index=unique_keys)
 decoder = keys.searchsorted(unique_keys)
 
 return 

Re: [Numpy-discussion] 2D binning

2010-06-02 Thread josef . pktd
On Wed, Jun 2, 2010 at 2:09 PM, Mathew Yeates mat.yea...@gmail.com wrote:
 I'm on Windows, using a precompiled binary. I never built numpy/scipy on
 Windows.

ndimage measurements has been recently rewritten. ndimage is very fast
but (the old version) has insufficient type checking and may crash on
wrong inputs.

I managed to work with it in the past on Windows. Maybe you could try
to check the dtypes of the arguments for ndi.mean. (Preferably in an
interpreter session where you don't mind if it crashes)

I don't remember the type restrictions, but there are/were several
tickets for it.

Josef



 On Wed, Jun 2, 2010 at 10:45 AM, Wes McKinney wesmck...@gmail.com wrote:

 On Wed, Jun 2, 2010 at 1:23 PM, Mathew Yeates mat.yea...@gmail.com
 wrote:
  thanks. I am also getting an error in ndi.mean
  Were you getting the error
  RuntimeError: data type not supported?
 
  -Mathew
 
  On Wed, Jun 2, 2010 at 9:40 AM, Wes McKinney wesmck...@gmail.com
  wrote:
 
  On Wed, Jun 2, 2010 at 3:41 AM, Vincent Schut sc...@sarvision.nl
  wrote:
   On 06/02/2010 04:52 AM, josef.p...@gmail.com wrote:
   On Tue, Jun 1, 2010 at 9:57 PM, Zachary
   Pincuszachary.pin...@yale.edu
    wrote:
   I guess it's as fast as I'm going to get. I don't really see any
   other way. BTW, the lat/lons are integers)
  
   You could (in c or cython) try a brain-dead hashtable with no
   collision detection:
  
   for lat, long, data in dataset:
      bin = (lat ^ long) % num_bins
      hashtable[bin] = update_incremental_mean(hashtable[bin], data)
  
   you'll of course want to do some experiments to see if your data
   are
   sufficiently sparse and/or you can afford a large enough hashtable
   array that you won't get spurious hash collisions. Adding error-
   checking to ensure that there are no collisions would be pretty
   trivial (just keep a table of the lat/long for each hash value,
   which
   you'll need anyway, and check that different lat/long pairs don't
   get
   assigned the same bin).
  
   Zach
  
  
  
   -Mathew
  
   On Tue, Jun 1, 2010 at 1:49 PM, Zachary
   Pincuszachary.pin...@yale.edu
   wrote:
   Hi
   Can anyone think of a clever (non-lopping) solution to the
   following?
  
   A have a list of latitudes, a list of longitudes, and list of
   data
   values. All lists are the same length.
  
   I want to compute an average  of data values for each lat/lon
   pair.
   e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then
   data[1001] = (data[1001] + data[2001])/2
  
   Looping is going to take wa to long.
  
   As a start, are the equal lat/lon pairs exactly equal (i.e.
   either
   not floating-point, or floats that will always compare equal, that
   is,
   the floating-point bit-patterns will be guaranteed to be
   identical)
   or
   approximately equal to float tolerance?
  
   If you're in the approx-equal case, then look at the KD-tree in
   scipy
   for doing near-neighbors queries.
  
   If you're in the exact-equal case, you could consider hashing the
   lat/
   lon pairs or something. At least then the looping is O(N) and not
   O(N^2):
  
   import collections
   grouped = collections.defaultdict(list)
   for lt, ln, da in zip(lat, lon, data):
      grouped[(lt, ln)].append(da)
  
   averaged = dict((ltln, numpy.mean(da)) for ltln, da in
   grouped.items())
  
   Is that fast enough?
  
   If the lat lon can be converted to a 1d label as Wes suggested, then
   in a similar timing exercise ndimage was the fastest.
   http://mail.scipy.org/pipermail/scipy-user/2009-February/019850.html
  
   And as you said your lats and lons are integers, you could simply do
  
   ll = lat*1000 + lon
  
   to get unique 'hashes' or '1d labels' for you latlon pairs, as a lat
   or
   lon will never exceed 360 (degrees).
  
   After that, either use the ndimage approach, or you could use
   histogramming with weighting by data values and divide by histogram
   withouth weighting, or just loop.
  
   Vincent
  
  
   (this was for python 2.4, also later I found np.bincount which
   requires that the labels are consecutive integers, but is as fast as
   ndimage)
  
   I don't know how it would compare to the new suggestions.
  
   Josef
  
  
  
  
   Zach
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
 
  I was curious about how fast ndimage was for this operation so here's
  

Re: [Numpy-discussion] 2D binning

2010-06-02 Thread Wes McKinney
On Wed, Jun 2, 2010 at 2:26 PM,  josef.p...@gmail.com wrote:
 On Wed, Jun 2, 2010 at 2:09 PM, Mathew Yeates mat.yea...@gmail.com wrote:
 I'm on Windows, using a precompiled binary. I never built numpy/scipy on
 Windows.

 ndimage measurements has been recently rewritten. ndimage is very fast
 but (the old version) has insufficient type checking and may crash on
 wrong inputs.

 I managed to work with it in the past on Windows. Maybe you could try
 to check the dtypes of the arguments for ndi.mean. (Preferably in an
 interpreter session where you don't mind if it crashes)

 I don't remember the type restrictions, but there are/were several
 tickets for it.

 Josef



 On Wed, Jun 2, 2010 at 10:45 AM, Wes McKinney wesmck...@gmail.com wrote:

 On Wed, Jun 2, 2010 at 1:23 PM, Mathew Yeates mat.yea...@gmail.com
 wrote:
  thanks. I am also getting an error in ndi.mean
  Were you getting the error
  RuntimeError: data type not supported?
 
  -Mathew
 
  On Wed, Jun 2, 2010 at 9:40 AM, Wes McKinney wesmck...@gmail.com
  wrote:
 
  On Wed, Jun 2, 2010 at 3:41 AM, Vincent Schut sc...@sarvision.nl
  wrote:
   On 06/02/2010 04:52 AM, josef.p...@gmail.com wrote:
   On Tue, Jun 1, 2010 at 9:57 PM, Zachary
   Pincuszachary.pin...@yale.edu
    wrote:
   I guess it's as fast as I'm going to get. I don't really see any
   other way. BTW, the lat/lons are integers)
  
   You could (in c or cython) try a brain-dead hashtable with no
   collision detection:
  
   for lat, long, data in dataset:
      bin = (lat ^ long) % num_bins
      hashtable[bin] = update_incremental_mean(hashtable[bin], data)
  
   you'll of course want to do some experiments to see if your data
   are
   sufficiently sparse and/or you can afford a large enough hashtable
   array that you won't get spurious hash collisions. Adding error-
   checking to ensure that there are no collisions would be pretty
   trivial (just keep a table of the lat/long for each hash value,
   which
   you'll need anyway, and check that different lat/long pairs don't
   get
   assigned the same bin).
  
   Zach
  
  
  
   -Mathew
  
   On Tue, Jun 1, 2010 at 1:49 PM, Zachary
   Pincuszachary.pin...@yale.edu
   wrote:
   Hi
   Can anyone think of a clever (non-lopping) solution to the
   following?
  
   A have a list of latitudes, a list of longitudes, and list of
   data
   values. All lists are the same length.
  
   I want to compute an average  of data values for each lat/lon
   pair.
   e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then
   data[1001] = (data[1001] + data[2001])/2
  
   Looping is going to take wa to long.
  
   As a start, are the equal lat/lon pairs exactly equal (i.e.
   either
   not floating-point, or floats that will always compare equal, that
   is,
   the floating-point bit-patterns will be guaranteed to be
   identical)
   or
   approximately equal to float tolerance?
  
   If you're in the approx-equal case, then look at the KD-tree in
   scipy
   for doing near-neighbors queries.
  
   If you're in the exact-equal case, you could consider hashing the
   lat/
   lon pairs or something. At least then the looping is O(N) and not
   O(N^2):
  
   import collections
   grouped = collections.defaultdict(list)
   for lt, ln, da in zip(lat, lon, data):
      grouped[(lt, ln)].append(da)
  
   averaged = dict((ltln, numpy.mean(da)) for ltln, da in
   grouped.items())
  
   Is that fast enough?
  
   If the lat lon can be converted to a 1d label as Wes suggested, then
   in a similar timing exercise ndimage was the fastest.
   http://mail.scipy.org/pipermail/scipy-user/2009-February/019850.html
  
   And as you said your lats and lons are integers, you could simply do
  
   ll = lat*1000 + lon
  
   to get unique 'hashes' or '1d labels' for you latlon pairs, as a lat
   or
   lon will never exceed 360 (degrees).
  
   After that, either use the ndimage approach, or you could use
   histogramming with weighting by data values and divide by histogram
   withouth weighting, or just loop.
  
   Vincent
  
  
   (this was for python 2.4, also later I found np.bincount which
   requires that the labels are consecutive integers, but is as fast as
   ndimage)
  
   I don't know how it would compare to the new suggestions.
  
   Josef
  
  
  
  
   Zach
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
 
  

Re: [Numpy-discussion] 2D binning

2010-06-02 Thread Mathew Yeates
Nope. This version didn't work either.



 If you're on Python 2.6 the binary on here might work for you:

 http://www.lfd.uci.edu/~gohlke/pythonlibs/

 It looks recent enough to have the rewritten ndimage
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] masked constant

2010-06-02 Thread Pierre GM
I'm about to commit some changes in the numpy trunk regarding the masked 
constant. Namely, 'masked' is now its own object. It inherits from MaskedArray, 
but doesn't have a fill_value and is represented slightly differently:
 repr(ma.masked)
'masked'
The reason behind that is to reduce the headaches of new users (who tend to 
wonder why a masked item has a different fill_value than the original array 
(that's because it's a *constant*)).
Any objections ? (I could have silently committed that a while ago, but I don't 
want to mess w/ anybody's  supplies of chill-pills...)
P.
 
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 2D binning

2010-06-02 Thread Christoph Gohlke


On 6/2/2010 2:32 PM, Mathew Yeates wrote:
 Nope. This version didn't work either.



 If you're on Python 2.6 the binary on here might work for you:

 http://www.lfd.uci.edu/~gohlke/pythonlibs/

 It looks recent enough to have the rewritten ndimage



On 6/2/2010 2:32 PM, Mathew Yeates wrote:
 Nope. This version didn't work either.


Please note that in order to use the ndimage package from 
http://www.lfd.uci.edu/~gohlke/pythonlibs/#ndimage you have to use 
import ndimage as ndi instead of import scipy.ndimage as ndi. The 
code posted by Wes works for me on Windows after that change.

--
Christoph

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 2D binning

2010-06-02 Thread Stephen Simmons

On 1/06/2010 10:51 PM, Wes McKinney wrote:
  snip
  This is a pretty good example of the group-by problem that will
  hopefully work its way into a future edition of NumPy.

Wes (or anyone else), please can you elaborate on any plans for groupby?

I've made my own modification to numpy.bincount for doing groupby-type 
operations but not contributed anything back to the numpy community.

Is there any interest from European-based people to work on groupby etc 
at the Euro SciPy sprints in July? If others are interested, maybe we 
could work out requirements beforehand and then do some coding in Paris.

Cheers
Stephen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] 2 bugs related to isinf and isfinite generate crazy warnings

2010-06-02 Thread Eric Firing
http://www.mail-archive.com/numpy-discussion@scipy.org/msg23912.html

On some systems--but evidently not for most numpy users, or there would 
have been a steady stream of screams--the appearance of np.inf in any 
call to np.isfinite or np.isinf yields this:

In [1]:import numpy as np

In [2]:np.isinf(np.inf)
Warning: invalid value encountered in isinf
Out[2]:True


This generates streams of warnings if np.isinf or np.isfinite is applied 
to an array with many inf values.

The problem is a combination of two bugs:

1) When building with setup.py, but perhaps not with scons (which I 
haven't tried yet), NPY_HAVE_DECL_ISFINITE and friends are never 
defined, even though they should be--this is all on ubuntu 10.4, in my 
case, and isfinite and isinf most definitely are in math.h.  It looks to 
me like the only mechanism for defining these is in SConstruct.

2) So, with no access to the built-in isinf etc., npy_math.h falls back 
on the compact but slow macros that replace the much nicer ones that 
were in numpy a year or so ago.  Here is the relevant part of npy_math.h:

#ifndef NPY_HAVE_DECL_ISFINITE
 #define npy_isfinite(x) !npy_isnan((x) + (-x))
#else
 #define npy_isfinite(x) isfinite((x))
#endif

#ifndef NPY_HAVE_DECL_ISINF
 #define npy_isinf(x) (!npy_isfinite(x)  !npy_isnan(x))
#else
 #define npy_isinf(x) isinf((x))
#endif

isinf calls isfinite, and isfinite calls isnan(x-x).  If x is inf, this 
generates a nan, so the return value is correct--but generating that nan 
set the INVALID flag, hence the warning.

Sorry I don't have a patch to offer.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 2D binning

2010-06-02 Thread Wes McKinney
On Wed, Jun 2, 2010 at 6:18 PM, Stephen Simmons m...@stevesimmons.com wrote:

 On 1/06/2010 10:51 PM, Wes McKinney wrote:
   snip
   This is a pretty good example of the group-by problem that will
   hopefully work its way into a future edition of NumPy.

 Wes (or anyone else), please can you elaborate on any plans for groupby?

 I've made my own modification to numpy.bincount for doing groupby-type
 operations but not contributed anything back to the numpy community.

 Is there any interest from European-based people to work on groupby etc
 at the Euro SciPy sprints in July? If others are interested, maybe we
 could work out requirements beforehand and then do some coding in Paris.

 Cheers
 Stephen

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


I know we're planning to discuss at SciPy in Austin and will hopefully
have a gameplan. We should coordinate the discussions / implementation
somewhere. I've implemented some groupby functionality in pandas but I
guess at a higher level than what's been proposed to add to NumPy.

On the pystatsmodels mailing list we've been starting to have some
discussions about statistically-oriented data structures in general--
I'll be sending an e-mail to the NumPy mailing list soon to start a
broader discussion which will hopefully lead to some good things at
the conferences.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 2 bugs related to isinf and isfinite generate crazy warnings

2010-06-02 Thread Charles R Harris
On Wed, Jun 2, 2010 at 7:11 PM, Eric Firing efir...@hawaii.edu wrote:

 http://www.mail-archive.com/numpy-discussion@scipy.org/msg23912.html

 On some systems--but evidently not for most numpy users, or there would
 have been a steady stream of screams--the appearance of np.inf in any
 call to np.isfinite or np.isinf yields this:

 In [1]:import numpy as np

 In [2]:np.isinf(np.inf)
 Warning: invalid value encountered in isinf
 Out[2]:True


 This generates streams of warnings if np.isinf or np.isfinite is applied
 to an array with many inf values.

 The problem is a combination of two bugs:

 1) When building with setup.py, but perhaps not with scons (which I
 haven't tried yet), NPY_HAVE_DECL_ISFINITE and friends are never
 defined, even though they should be--this is all on ubuntu 10.4, in my
 case, and isfinite and isinf most definitely are in math.h.  It looks to
 me like the only mechanism for defining these is in SConstruct.

 2) So, with no access to the built-in isinf etc., npy_math.h falls back
 on the compact but slow macros that replace the much nicer ones that
 were in numpy a year or so ago.  Here is the relevant part of npy_math.h:

 #ifndef NPY_HAVE_DECL_ISFINITE
 #define npy_isfinite(x) !npy_isnan((x) + (-x))
 #else
 #define npy_isfinite(x) isfinite((x))
 #endif

 #ifndef NPY_HAVE_DECL_ISINF
 #define npy_isinf(x) (!npy_isfinite(x)  !npy_isnan(x))
 #else
 #define npy_isinf(x) isinf((x))
 #endif

 isinf calls isfinite, and isfinite calls isnan(x-x).  If x is inf, this
 generates a nan, so the return value is correct--but generating that nan
 set the INVALID flag, hence the warning.

 Sorry I don't have a patch to offer.


Open a ticket. Since this looks fixable we should get it fixed for 1.5.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Technicalities of the SVN - GIT transition

2010-06-02 Thread Jarrod Millman
On Wed, Jun 2, 2010 at 10:08 AM, Matthew Brett matthew.br...@gmail.com wrote:
 Do y'all think opt-in?  Or opt-out?    If it's opt-in I guess you'll
 catch most of the current committers, and most of the others you'll
 lose, but maybe that's good enough.

I think it should be opt-in.  How would opt-out work?  Would someone
create new accounts for all the contributors and then give them
access?  If that is the case, I would really rather avoid this.  If
someone wants an account on github, they should register an account on
their own.  Or am I missing something?

Thanks,
Jarro
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Technicalities of the SVN - GIT transition

2010-06-02 Thread David
On 06/03/2010 10:24 AM, Jarrod Millman wrote:
 On Wed, Jun 2, 2010 at 10:08 AM, Matthew Brettmatthew.br...@gmail.com  
 wrote:
 Do y'all think opt-in?  Or opt-out?If it's opt-in I guess you'll
 catch most of the current committers, and most of the others you'll
 lose, but maybe that's good enough.

 I think it should be opt-in.  How would opt-out work?  Would someone
 create new accounts for all the contributors and then give them
 access?

Just to be clear, this has nothing to do with accounts on github, or any 
registered thing. This is *only* about username/email as recognized by 
git itself (as recorded in the commit objects).

Several people already answered me privately to give me their email, I 
think that's the way to go,

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 2 bugs related to isinf and isfinite generate crazy warnings

2010-06-02 Thread Charles R Harris
On Wed, Jun 2, 2010 at 7:11 PM, Eric Firing efir...@hawaii.edu wrote:

 http://www.mail-archive.com/numpy-discussion@scipy.org/msg23912.html

 On some systems--but evidently not for most numpy users, or there would
 have been a steady stream of screams--the appearance of np.inf in any
 call to np.isfinite or np.isinf yields this:

 In [1]:import numpy as np

 In [2]:np.isinf(np.inf)
 Warning: invalid value encountered in isinf
 Out[2]:True


 This generates streams of warnings if np.isinf or np.isfinite is applied
 to an array with many inf values.

 The problem is a combination of two bugs:

 1) When building with setup.py, but perhaps not with scons (which I
 haven't tried yet), NPY_HAVE_DECL_ISFINITE and friends are never
 defined, even though they should be--this is all on ubuntu 10.4, in my
 case, and isfinite and isinf most definitely are in math.h.  It looks to
 me like the only mechanism for defining these is in SConstruct.

 2) So, with no access to the built-in isinf etc., npy_math.h falls back
 on the compact but slow macros that replace the much nicer ones that
 were in numpy a year or so ago.  Here is the relevant part of npy_math.h:

 #ifndef NPY_HAVE_DECL_ISFINITE
 #define npy_isfinite(x) !npy_isnan((x) + (-x))
 #else
 #define npy_isfinite(x) isfinite((x))
 #endif

 #ifndef NPY_HAVE_DECL_ISINF
 #define npy_isinf(x) (!npy_isfinite(x)  !npy_isnan(x))
 #else
 #define npy_isinf(x) isinf((x))
 #endif

 isinf calls isfinite, and isfinite calls isnan(x-x).  If x is inf, this
 generates a nan, so the return value is correct--but generating that nan
 set the INVALID flag, hence the warning.

 Sorry I don't have a patch to offer.


I'm guessing that the problem is that in gcc isinf is a macro, the relevant
functions are actually  __isinff, __isinf, and __isinfl.


The library itself yields:

$[char...@ubuntu lib]$ nm -D libc.so.6 | grep isinf
00032d10 T __isinf
00033110 T __isinff
00033420 T __isinfl
00032d10 W isinf
00033110 W isinff
00033420 W isinfl


Where the symbols without the underscores are weak. I have no idea what the
implications of that are ;)

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 2 bugs related to isinf and isfinite generate crazy warnings

2010-06-02 Thread David
On 06/03/2010 10:11 AM, Eric Firing wrote:
 http://www.mail-archive.com/numpy-discussion@scipy.org/msg23912.html

 On some systems--but evidently not for most numpy users, or there would
 have been a steady stream of screams--the appearance of np.inf in any
 call to np.isfinite or np.isinf yields this:

 In [1]:import numpy as np

 In [2]:np.isinf(np.inf)
 Warning: invalid value encountered in isinf
 Out[2]:True


 This generates streams of warnings if np.isinf or np.isfinite is applied
 to an array with many inf values.

 The problem is a combination of two bugs:

 1) When building with setup.py, but perhaps not with scons (which I
 haven't tried yet), NPY_HAVE_DECL_ISFINITE and friends are never
 defined, even though they should be--this is all on ubuntu 10.4, in my
 case, and isfinite and isinf most definitely are in math.h.  It looks to
 me like the only mechanism for defining these is in SConstruct.

Actually, there is a bug in setup.py to detect those for python = 2.6, 
I have fixed this.


 2) So, with no access to the built-in isinf etc., npy_math.h falls back
 on the compact but slow macros that replace the much nicer ones that
 were in numpy a year or so ago.

Which one are you thinking about: the ones using fpclassify ? Could you 
show the code where the current version is slower ?

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 2 bugs related to isinf and isfinite generate crazy warnings

2010-06-02 Thread Charles R Harris
On Wed, Jun 2, 2010 at 7:11 PM, Eric Firing efir...@hawaii.edu wrote:

 http://www.mail-archive.com/numpy-discussion@scipy.org/msg23912.html

 On some systems--but evidently not for most numpy users, or there would
 have been a steady stream of screams--the appearance of np.inf in any
 call to np.isfinite or np.isinf yields this:

 In [1]:import numpy as np

 In [2]:np.isinf(np.inf)
 Warning: invalid value encountered in isinf
 Out[2]:True


 This generates streams of warnings if np.isinf or np.isfinite is applied
 to an array with many inf values.

 The problem is a combination of two bugs:

 1) When building with setup.py, but perhaps not with scons (which I
 haven't tried yet), NPY_HAVE_DECL_ISFINITE and friends are never
 defined, even though they should be--this is all on ubuntu 10.4, in my
 case, and isfinite and isinf most definitely are in math.h.  It looks to
 me like the only mechanism for defining these is in SConstruct.

 2) So, with no access to the built-in isinf etc., npy_math.h falls back
 on the compact but slow macros that replace the much nicer ones that
 were in numpy a year or so ago.  Here is the relevant part of npy_math.h:

 #ifndef NPY_HAVE_DECL_ISFINITE
 #define npy_isfinite(x) !npy_isnan((x) + (-x))
 #else
 #define npy_isfinite(x) isfinite((x))
 #endif

 #ifndef NPY_HAVE_DECL_ISINF
 #define npy_isinf(x) (!npy_isfinite(x)  !npy_isnan(x))
 #else
 #define npy_isinf(x) isinf((x))
 #endif

 isinf calls isfinite, and isfinite calls isnan(x-x).  If x is inf, this
 generates a nan, so the return value is correct--but generating that nan
 set the INVALID flag, hence the warning.


More precisely, it looks like  HAVE_DECL_ISINF and friends are defined in
Python.h, hence NPY_HAVE_DECL_ISINF doesn't get defined. Hmm... So maybe

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 2 bugs related to isinf and isfinite generate crazy warnings

2010-06-02 Thread Charles R Harris
On Wed, Jun 2, 2010 at 9:00 PM, David da...@silveregg.co.jp wrote:

 On 06/03/2010 10:11 AM, Eric Firing wrote:
  http://www.mail-archive.com/numpy-discussion@scipy.org/msg23912.html
 
  On some systems--but evidently not for most numpy users, or there would
  have been a steady stream of screams--the appearance of np.inf in any
  call to np.isfinite or np.isinf yields this:
 
  In [1]:import numpy as np
 
  In [2]:np.isinf(np.inf)
  Warning: invalid value encountered in isinf
  Out[2]:True
 
 
  This generates streams of warnings if np.isinf or np.isfinite is applied
  to an array with many inf values.
 
  The problem is a combination of two bugs:
 
  1) When building with setup.py, but perhaps not with scons (which I
  haven't tried yet), NPY_HAVE_DECL_ISFINITE and friends are never
  defined, even though they should be--this is all on ubuntu 10.4, in my
  case, and isfinite and isinf most definitely are in math.h.  It looks to
  me like the only mechanism for defining these is in SConstruct.

 Actually, there is a bug in setup.py to detect those for python = 2.6,
 I have fixed this.


Beat me to it ;)

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 2 bugs related to isinf and isfinite generate crazy warnings

2010-06-02 Thread Eric Firing
On 06/02/2010 05:00 PM, David wrote:
 On 06/03/2010 10:11 AM, Eric Firing wrote:
 http://www.mail-archive.com/numpy-discussion@scipy.org/msg23912.html

 On some systems--but evidently not for most numpy users, or there would
 have been a steady stream of screams--the appearance of np.inf in any
 call to np.isfinite or np.isinf yields this:

 In [1]:import numpy as np

 In [2]:np.isinf(np.inf)
 Warning: invalid value encountered in isinf
 Out[2]:True


 This generates streams of warnings if np.isinf or np.isfinite is applied
 to an array with many inf values.

 The problem is a combination of two bugs:

 1) When building with setup.py, but perhaps not with scons (which I
 haven't tried yet), NPY_HAVE_DECL_ISFINITE and friends are never
 defined, even though they should be--this is all on ubuntu 10.4, in my
 case, and isfinite and isinf most definitely are in math.h.  It looks to
 me like the only mechanism for defining these is in SConstruct.

 Actually, there is a bug in setup.py to detect those for python= 2.6,
 I have fixed this.

David,

Thank you.



 2) So, with no access to the built-in isinf etc., npy_math.h falls back
 on the compact but slow macros that replace the much nicer ones that
 were in numpy a year or so ago.

 Which one are you thinking about: the ones using fpclassify ? Could you
 show the code where the current version is slower ?

OK, my memory was off as usual.  What I was remembering was actually this:

http://currents.soest.hawaii.edu/hgstage/hgwebdir.cgi/mpl_hg/file/4ab6f2159bea/matplotlib/src/MPL_isnan.h

So the macro versions were in mpl, and derived from numarray, but 
apparently were never in numpy.

In any case, in MPL_isnan.h, isfinite is fast, because it involves only 
a single bitwise-and plus comparison; isnan and isinf are slower, 
because they involve about twice as much work.

Given that the present numpy isfinite macro is broken--it raises an fp 
error--there is not much point in talking about its performance.

What I don't know is whether there is some reason not to use the 
MPL_isnan.h macros as the backups in numpy, for platforms that do not 
have their own isnan, isfinite, and isinf.

I do know (or think...) that two or three years ago I timed numpy.isnan 
and numpy.isfinite, and found that the latter was notably faster than 
the former, consistent with what I would have expected if the 
implementation was as in MPL_isnan.h.  Presumably, at that time my numpy 
build was using the system versions, and maybe they are implemented in a 
similar way.  Looking in /usr/include/bits/*.h, I gave up trying to 
figure out what the system macros really are doing.

Eric



 cheers,

 David
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 2 bugs related to isinf and isfinite generate crazy warnings

2010-06-02 Thread David
On 06/03/2010 01:02 PM, Eric Firing wrote:


 In any case, in MPL_isnan.h, isfinite is fast, because it involves only
 a single bitwise-and plus comparison; isnan and isinf are slower,
 because they involve about twice as much work.

Concerning speed, the problem is that it depends a lot on the 
compiler/architecture. When I timed some implementations of isnan, just 
changing from Pentium 4 to Pentium M with the exact same 
compiler/options gave mixed results. It also depends on the input - some 
methods are faster for finite numbers but slower for nan/inf, etc...


 Given that the present numpy isfinite macro is broken--it raises an fp
 error--there is not much point in talking about its performance.

Indeed :)


 What I don't know is whether there is some reason not to use the
 MPL_isnan.h macros as the backups in numpy, for platforms that do not
 have their own isnan, isfinite, and isinf.

The problem is that it is actually quite some work. Float and double are 
not so difficult (two cases to deal with for endianness), but long 
double is a huge PITA - we have already 5 different implementations to 
deal with, and we are missing some (linux PPC).

 Presumably, at that time my numpy
 build was using the system versions, and maybe they are implemented in a
 similar way.  Looking in /usr/include/bits/*.h, I gave up trying to
 figure out what the system macros really are doing.

You have to look into sysdeps/ directory inside glibc sources (on 
Linux). They use clever tricks to avoid branching, but when I 
benchmarked those, things like x != x where much faster on the computer 
I tested this on. Most likely very CPU dependent.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion