> I guess it's as fast as I'm going to get. I don't really see any > other way. BTW, the lat/lons are integers)
You could (in c or cython) try a brain-dead "hashtable" with no collision detection: for lat, long, data in dataset: bin = (lat ^ long) % num_bins hashtable[bin] = update_incremental_mean(hashtable[bin], data) you'll of course want to do some experiments to see if your data are sufficiently sparse and/or you can afford a large enough hashtable array that you won't get spurious hash collisions. Adding error- checking to ensure that there are no collisions would be pretty trivial (just keep a table of the lat/long for each hash value, which you'll need anyway, and check that different lat/long pairs don't get assigned the same bin). Zach > -Mathew > > On Tue, Jun 1, 2010 at 1:49 PM, Zachary Pincus <zachary.pin...@yale.edu > > wrote: > > Hi > > Can anyone think of a clever (non-lopping) solution to the > following? > > > > A have a list of latitudes, a list of longitudes, and list of data > > values. All lists are the same length. > > > > I want to compute an average of data values for each lat/lon pair. > > e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then > > data[1001] = (data[1001] + data[2001])/2 > > > > Looping is going to take wayyyy to long. > > As a start, are the "equal" lat/lon pairs exactly equal (i.e. either > not floating-point, or floats that will always compare equal, that is, > the floating-point bit-patterns will be guaranteed to be identical) or > approximately equal to float tolerance? > > If you're in the approx-equal case, then look at the KD-tree in scipy > for doing near-neighbors queries. > > If you're in the exact-equal case, you could consider hashing the lat/ > lon pairs or something. At least then the looping is O(N) and not > O(N^2): > > import collections > grouped = collections.defaultdict(list) > for lt, ln, da in zip(lat, lon, data): > grouped[(lt, ln)].append(da) > > averaged = dict((ltln, numpy.mean(da)) for ltln, da in > grouped.items()) > > Is that fast enough? > > Zach > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion