In thinking about this some more an SQL solution would be a bit easier. Try this where species and temp are from the prior post (i.e. after processing with chron).
The first statement finds the minimum distances for each species and station_id combination. The second finds all combinations of a record in species with a record in temp, i.e. outer join. The last sql statement merges those two over Species, station_id and dist. Note that value is a keyword in sql so its name is automatically transformed to value__1 to avoid a conflict. At the end we fix up the dt column so its chron again. library(sqldf) mins <- sqldf("select s.Species, s.dt, t.station_id, min(abs(s.dt - t.dt)) dist from species2 s, temp2 t group by species, station_id") every <- sqldf("select s.Species, s.dt, t.station_id, t.value__1, abs(s.dt - t.dt) dist from species2 s, temp2 t") merged <- sqldf("select Species, e.dt, station_id, value__1 from mins join every e using(Species, station_id, dist)") merged <- transform(merged, dt = chron(dt)) On Tue, Mar 17, 2009 at 8:13 AM, Gabor Grothendieck <ggrothendi...@gmail.com> wrote: > I am assuming that for each species and station_id that you want > the value in that temp record whose date/time is closest to the > date/time in the species record along with the identifying information > (species, station_id) and date/time of the species. That interpretation > does give the same answer as in the fused data set you posted. > > First we read in temp and use chron to convert the date/times to > chron. Similarly we do that for species. > > Then we define a function which measures the "distance" between > two date/times and we define another function f which takes a > species rowname and merges that row with temp. Finally > we call lapply that function to species. > > library(chron) > > temp <- read.csv("temperature_data_Rexample.csv") > temp$dt <- as.chron(paste(temp$date, temp$hour), "%Y%m%d %H%M") > > species <- read.csv("species_data_Rexample.csv") > ds <- species$Date_Sampled > species$dt <- chron(sub(" .*", "", ds), gsub("[apm]+$|^.* ", "", ds)) + > (regexpr("pm", ds) > 0)/2 # add half a day if its pm > > mydist <- function(x, y) abs(as.numeric(x-y)) > > f <- function(r) { > s <- species[r, ] > out <- by(temp, temp$station_id, function(x) { > imin <- which.min(mydist(x$dt, s$dt)) > data.frame(Species = s$Species, Date = s$dt, > station_id = x[imin, "station_id"], value = x[imin, > "value"]) > }) > do.call(rbind, out) > } > > do.call(rbind, lapply(rownames(species), f)) > > Result of last line is: > > Species Date station_id value > 1 SpeciesB (06/23/08 13:55:11) ANH 2.25 > 2 SpeciesA (06/23/08 13:43:11) ANH 2.25 > 3 SpeciesC (06/23/08 13:55:11) ANH 2.25 > 4 SpeciesB (06/23/08 13:55:11) BDT 3.82 > 5 SpeciesA (06/23/08 13:43:11) BDT 3.90 > 6 SpeciesC (06/23/08 13:55:11) BDT 3.82 > > > > On Mon, Mar 16, 2009 at 7:41 PM, Jonathan Greenberg > <greenb...@ucdavis.edu> wrote: >> Weird -- the email was sent through my gmail account, looks like the .csvs >> got intercepted somewhere along the way. At any rate, I placed them on a >> website: >> >> http://cstars.ucdavis.edu/~jongreen/temp/temporal_join_R/ >> >> --j >> >> Gabor Grothendieck wrote: >>> >>> There was nothing attached. >>> >>> On Mon, Mar 16, 2009 at 3:11 PM, Jonathan Greenberg >>> <greenb...@ucdavis.edu> wrote: >>> >>>> >>>> Sorry for the immediate follow-up, but Phil Spector correctly reminded me >>>> this is a lot easier for the community I provide some sample data, so I'm >>>> attaching 3 small CSVs to this email: >>>> >>>> species_data_Rexample.csv contains the "field data" (which species was >>>> ID'd >>>> and what time it was ID'd), >>>> temperature_data_Rexample.csv contains the date, time, station ID and the >>>> temperature "value" >>>> >>>> I'd like a dataframe which contains for each unique line in >>>> species_data_Rexample.csv, a series of lines, one per station, and the >>>> temperature of the nearest time stamp, or an interpolated value (weighted >>>> average would be fine, but so would just grabbing the nearest value), so >>>> for >>>> this example I'd like something that looks like the csv >>>> "fused_data_Rexample.csv" >>>> >>>> Thanks! >>>> >>>> --j >>>> >>>> Jonathan Greenberg wrote: >>>> >>>>> >>>>> I've been playing with zoo a bit, and it seems ok except it doesn't >>>>> support non-unique time stamps when performing joins. I have two >>>>> databases >>>>> which contain a dataframe of a Date object (with the time, not just >>>>> MM/DD/YY), e.g.: >>>>> >>>>> DB 1: >>>>> UniqueID,Date1,Data 1,Data 2 >>>>> >>>>> DB 2: >>>>> Date2, Station, Data 3 >>>>> >>>>> We'll say Station can contain three values: A,B and C >>>>> >>>>> DB 1 may have some repeat times, and DB 2 definitely has them -- >>>>> although >>>>> each Date, Station combo is unique (this DB contains weather data >>>>> collected >>>>> on the half-hour or fifteen minute interval at a set of stations). I'd >>>>> like >>>>> DB2's station and Data3 to be joined with DB1 based on the nearest time >>>>> stamp (interpolating Data3 or not). >>>>> >>>>> Ideally, I'd like a fused database such that I get for each uniqueID in >>>>> DB1: >>>>> >>>>> UniqueID,Date,Data1,Data2,Station,Data3 >>>>> >>>>> Thoughts? Hints? >>>>> >>>>> --j >>>>> >>>>> >>>>> >>>>> >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>>> >> >> -- >> >> Jonathan A. Greenberg, PhD >> Postdoctoral Scholar >> Center for Spatial Technologies and Remote Sensing (CSTARS) >> University of California, Davis >> One Shields Avenue >> The Barn, Room 250N >> Davis, CA 95616 >> Cell: 415-794-5043 >> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307 >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.