On Tue, Jun 1, 2010 at 9:57 PM, Zachary Pincus <[email protected]> wrote: >> I guess it's as fast as I'm going to get. I don't really see any >> other way. BTW, the lat/lons are integers) > > You could (in c or cython) try a brain-dead "hashtable" with no > collision detection: > > for lat, long, data in dataset: > bin = (lat ^ long) % num_bins > hashtable[bin] = update_incremental_mean(hashtable[bin], data) > > you'll of course want to do some experiments to see if your data are > sufficiently sparse and/or you can afford a large enough hashtable > array that you won't get spurious hash collisions. Adding error- > checking to ensure that there are no collisions would be pretty > trivial (just keep a table of the lat/long for each hash value, which > you'll need anyway, and check that different lat/long pairs don't get > assigned the same bin). > > Zach > > > >> -Mathew >> >> On Tue, Jun 1, 2010 at 1:49 PM, Zachary Pincus <[email protected] >> > wrote: >> > Hi >> > Can anyone think of a clever (non-lopping) solution to the >> following? >> > >> > A have a list of latitudes, a list of longitudes, and list of data >> > values. All lists are the same length. >> > >> > I want to compute an average of data values for each lat/lon pair. >> > e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then >> > data[1001] = (data[1001] + data[2001])/2 >> > >> > Looping is going to take wayyyy to long. >> >> As a start, are the "equal" lat/lon pairs exactly equal (i.e. either >> not floating-point, or floats that will always compare equal, that is, >> the floating-point bit-patterns will be guaranteed to be identical) or >> approximately equal to float tolerance? >> >> If you're in the approx-equal case, then look at the KD-tree in scipy >> for doing near-neighbors queries. >> >> If you're in the exact-equal case, you could consider hashing the lat/ >> lon pairs or something. At least then the looping is O(N) and not >> O(N^2): >> >> import collections >> grouped = collections.defaultdict(list) >> for lt, ln, da in zip(lat, lon, data): >> grouped[(lt, ln)].append(da) >> >> averaged = dict((ltln, numpy.mean(da)) for ltln, da in >> grouped.items()) >> >> Is that fast enough?
If the lat lon can be converted to a 1d label as Wes suggested, then in a similar timing exercise ndimage was the fastest. http://mail.scipy.org/pipermail/scipy-user/2009-February/019850.html (this was for python 2.4, also later I found np.bincount which requires that the labels are consecutive integers, but is as fast as ndimage) I don't know how it would compare to the new suggestions. Josef >> >> Zach >> _______________________________________________ >> NumPy-Discussion mailing list >> [email protected] >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> [email protected] >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
