Try dtw. First convert ref to numeric since dtw does not handle character input. Then align using dtw and NA out repeated values in the alignment. Finally zap ugly row names and calculate loading:
library(dtw) s1 <- as.numeric(stop_sequence$ref) s2 <- as.numeric(factor(as.character(stop_onoff$ref), levels(stop_sequence$ref))) a <- dtw(s1, s2) DF <- cbind(stop_sequence, stop_onoff[replace(a$index2, c(FALSE, diff(a$index2) == 0), NA), ])[-3] rownames(DF) <- NULL transform(DF, loading = cumsum(ifelse(is.na(on), 0, on)) - cumsum(ifelse(is.na(off), 0, off))) giving: seq ref on off loading 1 10 A 5 0 5 2 20 B NA NA 5 3 30 C NA NA 5 4 40 D 0 2 3 5 50 B 10 2 11 6 60 A 0 6 5 You will need to test this with more data and tweak it if necessary via the various dtw arguments. On Fri, Aug 29, 2014 at 8:46 PM, Adam Lawrence <alaw...@gmail.com> wrote: > I am hoping someone can help me with a bus stop sequencing problem in R, > where I need to match counts of people getting on and off a bus to the > correct stop in the bus route stop sequence. I have tried looking > online/forums for sequence matching but seems to refer to numeric sequences > or DNA matching and over my head. I am after a simple example if anyone can > please help. > > I have two data series as per below (from database), that I want to > combine. In this example “stop_sequence” includes the equence (seq) of bus > stops and “stop_onoff” is a count of people getting on and off at certain > stops (there is no entry if noone gets on or off). > > stop_sequence <- data.frame(seq=c(10,20,30,40,50,60), > ref=c('A','B','C','D','B','A')) > ## seq ref > ## 1 10 A > ## 2 20 B > ## 3 30 C > ## 4 40 D > ## 5 50 B > ## 6 60 A > stop_onoff <- > data.frame(ref=c('A','D','B','A'),on=c(5,0,10,0),off=c(0,2,2,6)) > ## ref on off > ## 1 A 5 0 > ## 2 D 0 2 > ## 3 B 10 2 > ## 4 A 0 6 > > I need to match the stop_onoff numbers in the right sto sequence, with the > correctly matched output as follows (load is a cumulative count of on and > off) > > desired_output <- data.frame(seq=c(10,20,30,40,50,60), > ref=c('A','B','C','D','B','A'), > on=c(5,'-','-',0,10,0),off=c(0,'-','-',2,2,6), load=c(5,0,0,3,11,5)) > ## seq ref on off load > ## 1 10 A 5 0 5 > ## 2 20 B - - 0 > ## 3 30 C - - 0 > ## 4 40 D 0 2 3 > ## 5 50 B 10 2 11 > ## 6 60 A 0 6 5 > > In this example the stop “B” is matched to the second stop “B” in the stop > sequence and not the first because the onoff data is after stop “D”. > > Any guidance much appreciated. > > Regards > Adam > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.