I was able to adapt your solution using packages with which I am more familiar.
myres2 <- merge(option.trades, stock.trades, by = "timestamp", all = TRUE) myres2[,"stock.timestamp"] <- ifelse(is.na(myres2$stock.price), NA, myres2$timestamp) myres2$stock.timestamp <- as.POSIXct(myres2$stock.timestamp, origin = "1970-01-01") library(zoo) myres2$stock.price <- na.locf(myres2$stock.price) myres2$stock.timestamp <- na.locf(myres2$stock.timestamp) myres2 <- myres2[!is.na(myres2$option.price),] row.names(myres2) <- NULL all.equal(myres, myres2[,c(1, 2, 4, 3)]) # TRUE This calculation is indeed faster. Thanks for your help, Naresh > On Aug 8, 2023, at 5:39 AM, Eric Berger <ericjber...@gmail.com> wrote: > > Hi Naresh, > Perhaps the below is faster than your approach > > library(dplyr) > library(tidyr) > merge(option.trades, stock.trades, by="timestamp", all=TRUE) |> > dplyr::arrange(timestamp) |> > dplyr::mutate(stock.timestamp = > as.POSIXct(ifelse(is.na(option.price), timestamp, NA))) |> > tidyr::fill(stock.price, stock.timestamp) |> > dplyr::filter(!is.na(option.price)) |> > dplyr::select(1,2,4,3) -> > myres2 > > identical(myres, myres2) ## TRUE > > > On Tue, Aug 8, 2023 at 5:32 AM Naresh Gurbuxani > <naresh_gurbux...@hotmail.com> wrote: >> >> >> I have two dataframes, each with a column for timestamp. I want to >> merge the two dataframes such that each row from first dataframe >> is matched with the row in the second dataframe with most recent but >> preceding timestamp. Here is an example. >> >> option.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:22", >> "2023-08-07 10:25:33", "2023-08-07 10:28:41")), option.price = c(2.5, 2.7, >> 1.8)) >> >> stock.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07 10:23:21", >> "2023-08-07 10:23:34", "2023-08-07 10:24:57", "2023-08-07 10:28:37", >> "2023-08-07 10:29:01")), stock.price = c(102.2, 102.9, 103.1, 101.8, 101.7)) >> >> stock.trades <- stock.trades[order(stock.trades$timestamp),] >> >> library(plyr) >> mystock.prices <- ldply(option.trades$timestamp, function(tstamp) >> tail(subset(stock.trades, timestamp <= tstamp), 1)) >> names(mystock.prices)[1] <- "stock.timestamp" >> myres <- cbind(option.trades, mystock.prices) >> >> This method works. But for large dataframes, it is very slow. Is there >> a way to speed up the merge? >> >> Thanks, >> Naresh >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.