For largish datasets, knn1 in package class (in the recommended VR bundle) is probably the quickest way to do this. Something like
knn1(D1[. 1:2], D2[, 1:2], D2$ID) On Wed, 30 Jul 2003, Roger Bivand wrote: > On Wed, 30 Jul 2003, Steve Sullivan wrote: > > > I'm trying to do the following: > > > > > > > > For each ordered pair of a data frame (D1) containing longitudes and > > latitudes and unique point IDs, calculate the distance to every point in > > another data frame (D2) also containing longitudes, latitudes and point > > IDs, and return to a new variable in D1 the point ID of the nearest > > element of D2. > > I think you can get quite a long way with the function rdist.earth() in > the fields package: > > > loc1 <- expand.grid(long=seq(-150,150,5), lat=seq(-70,70,5)) > > dim(loc1) > [1] 1769 2 > > loc2 <- expand.grid(long=seq(-150,150,7.5), lat=seq(-70,70,7.5)) > > dim(loc2) > [1] 779 2 > > dists <- rdist.earth(loc1, loc2) > > id12 <- apply(dists, 1, which.min) > > length(id12) > [1] 1769 > > id21 <- apply(dists, 2, which.min) > > length(id21) > [1] 779 > > using id12 and id21 to choose the point.ids if need be > > > loc2$point.id[id12] > > Roger > > > > > Dramatis personae (mostly self-explanatory): > > > > D1$long > > > > D1$lat > > > > D1$point.id > > > > neighbor.id (to be created; for each ordered pair in D1 the point ID of > > the nearest ordered pair in D2) > > > > D2$long > > > > D2$lat > > > > D2$point.id > > > > dist.geo (to be created) > > > > > > > > I've been attempting this with nested for() loops that step through each > > ordered pair in D1, and for each ordered pair [i] in D1 create a vector > > (dist.geo) the length of D2$lat (say) that contains the distance > > calculated from every ordered pair in D2 to the current ordered pair [i] > > of D1, assign a value for D1$neighbor.id[i] based on > > D2$point.id[(which.min(dist.geo)], and move on to the next ordered pair > > of D1 to create another dist.geo, assign another neighbor.id, etc. > > > > > > > > There are no missings/NAs in any of the longs, lats or point.ids, > > although advice on generalizing this to deal with them would be > > appreciated. > > > > > > > > What I've been trying: > > > > > > > > neighbor.id <- vector(length=length(D1$lat)) > > dist.geo <- vector(length=length(D2$lat)) > > for(i in 1:length(neighbor.id)){ > > for(j in 1:length(dist.geo)){ > > dist.geo[j] <- D1$lat[i]-D2$lat[j]} > > > > # Yes, I know that isn't the right formula, this is just a test > > > > neighbor.id[i] <- D2$point.id[which.min(dist.geo)]} > > > > > > > > What I get is a neighbor.id of the appropriate length, but which > > consists only of the same value repeated. Should I instead pass the > > which.min(dist.geo) to a variable before exiting the inner (j) loop, and > > reference that variable in place of which.min(dist.geo) in the last > > line? Or is this whole approach wrongheaded? > > > > > > > > This should be elementary, I know, so I appreciate everyone's > > forbearance. > > > > > > > > Steven Sullivan, Ph.D. > > > > Senior Associate > > > > The QED Group, LLC > > > > 1250 Eye St. NW, Suite 802 > > > > Washington, DC 20005 > > > > [EMAIL PROTECTED] > > > > 202.898.1910.x15 (v) > > > > 202.898.0887 (f) > > > > 202.421.8161 (m) > > > > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > [EMAIL PROTECTED] mailing list > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > > > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help