Re: [R] Obtaining data from a different row of data frame

arun Sun, 22 Sep 2013 08:20:07 -0700

Ira,
No problem.
If you change the ?for() loop to ?lapply, there should be increase in speed 
(Usually, it is not the case.  Here it is does, but still not as good in terms 
of speed as the method I showed).


df1<- structure(list(Dates = structure(c(13151, 13152, 13153, 13154,
 13157, 13158, 13159, 13160, 13161, 13164), class = "Date"), P1 = c(10,
 13, 16, 19, 22, 25, 28, 31, 34, 37), P2 = c(100, 102, 104, 106,
 108, 110, 112, 114, 116, 118), P3 = c(90, 94, 98, 102, 106, 110,
 114, 118, 122, 126), P4 = c(70, 75, 80, 85, 90, 95, 100, 105,
 110, 115), OF1 = c(3, 3, 4, 5, 2, 2, 2, 1, 1, 5), OF2 = c(5,
 3, 4, 2, 1, 2, 2, 1, 1, 0), OF3 = c(4, 3, 4, 1, 3, 2, 2, 1, 1,
 0), OF4 = c(3, 5, 4, 2, 3, 1, 2, 1, 1, 0)), .Names = c("Dates",
 "P1", "P2", "P3", "P4", "OF1", "OF2", "OF3", "OF4"), row.names = c(NA,
 -10L), class = "data.frame")
df1$OF2[9]<-4

df2<- df1
 df2[,10:13]<- NA
colnames(df2)[10:13]<- paste0("newPrice",1:4)

##your code

for(j in 2:5) {
 df2[j+8] = df2[df2[,j+4] + row(df2)[,j], j]
 }

#using ?lapply()
 df1[,10:13]<-lapply(2:5,function(j) {df1[df1[,j+4]+row(df2)[,j],j]})
colnames(df1)[10:13]<- colnames(df2)[10:13]
 identical(df1,df2)
#[1] TRUE

#######Speed check:
set.seed(29)
 df2<- data.frame(Dates=seq(as.Date("2006-01-03"),length.out=2000,by="1 
day"),cbind(matrix(sample(10:120,2000*300,replace=TRUE),ncol=300),matrix(sample(0:6,2000*300,replace=TRUE),ncol=300)))
 colnames(df2)[2:301]<- paste0("P",1:300)
 colnames(df2)[302:601]<- paste0("OF",1:300)
 df3<- df2


df2[,602:901]<-NA
 colnames(df2)[602:901]<- paste0("newPrice",1:300)

system.time({
 for(j in grep("^P",colnames(df2))) {
  df2[j+600] = df2[df2[,j+300] + row(df2)[,j], j]
  }
 })
 # user  system elapsed 
 #11.652   0.148  11.822 

system.time({df3[,602:901]<-lapply(2:301,function(j) 
{df3[df3[,j+300]+row(df3)[,j],j]}) })
#  user  system elapsed 
#  2.960   0.000   2.962 
colnames(df3)[602:901]<- colnames(df2)[602:901]
 identical(df2,df3)
#[1] TRUE

A.K.
 






________________________________
From: Ira Sharenow <irasharenow...@yahoo.com>
To: arun <smartpink...@yahoo.com> 
Sent: Sunday, September 22, 2013 10:49 AM
Subject: Re: [R] Obtaining data from a different row of data frame



Arun,

Thanks for the time you spent helping me.

I always learned to use the apply family (but maybe your strategies are 
faster), and now I think I am going to learn Hadley Wickham’s methods. Right 
now I need to do other parts of the project. In a few days I will take another 
look at your code to see if I can get more out of my code.

For my current project once I am finished my boss will use my code and possibly 
modify it, so speed is just one factor. Transparency and his future coding time 
is another consideration. I need to balance things off. 

I need tolerable speed and relatively easy to understand code. It is an 
interesting trade off.

Thank again for your help. I’ll get back to you when I take another look at the 
details of what you wrote.

Ira 
On 9/21/2013 11:27 PM, arun wrote:

HI, A modified code to avoid the ?sapply()
df1<- structure(list(Dates = structure(c(13151, 13152, 13153, 13154,
 13157, 13158, 13159, 13160, 13161, 13164), class = "Date"), P1 = c(10,
 13, 16, 19, 22, 25, 28, 31, 34, 37), P2 = c(100, 102, 104, 106,
 108, 110, 112, 114, 116, 118), P3 = c(90, 94, 98, 102, 106, 110,
 114, 118, 122, 126), P4 = c(70, 75, 80, 85, 90, 95, 100, 105,
 110, 115), OF1 = c(3, 3, 4, 5, 2, 2, 2, 1, 1, 5), OF2 = c(5,
 3, 4, 2, 1, 2, 2, 1, 1, 0), OF3 = c(4, 3, 4, 1, 3, 2, 2, 1, 1,
 0), OF4 = c(3, 5, 4, 2, 3, 1, 2, 1, 1, 0)), .Names = c("Dates",
 "P1", "P2", "P3", "P4", "OF1", "OF2", "OF3", "OF4"), row.names = c(NA,
 -10L), class = "data.frame")
df1$OF2[9]<-4 df2<- df1
 df2[,10:13]<- NA
colnames(df2)[10:13]<- paste0("newPrice",1:4) ##your code for(j in 2:5) {
 df2[j+8] = df2[df2[,j+4] + row(df2)[,j], j]
 }
indx1<- unlist(df1[,grep("OF",colnames(df1))],use.names=FALSE)
 indx1[rep(seq(nrow(df1)),4)%in% 6:10][indx1[rep(seq(nrow(df1)),4)%in% 6:10]- 
rep(5:1,4)>=0]<- NA val1<- unlist(df1[,grep("P",colnames(df1))],use.names=FALSE)
 df1[,10:13]<- val1[indx1+seq_along(indx1)]
 colnames(df1)[10:13]<- colnames(df2)[10:13]
identical(df1[,10:13],df2[,10:13])
#[1] TRUE ###On a bigger dataset:
set.seed(29)
 df2<- data.frame(Dates=seq(as.Date("2006-01-03"),length.out=2000,by="1 
day"),cbind(matrix(sample(10:120,2000*300,replace=TRUE),ncol=300),matrix(sample(0:6,2000*300,replace=TRUE),ncol=300)))
 colnames(df2)[2:301]<- paste0("P",1:300)
 colnames(df2)[302:601]<- paste0("OF",1:300)
 df3<- df2 df2[,602:901]<-NA
 colnames(df2)[602:901]<- paste0("newPrice",1:300)
 system.time({
 for(j in grep("^P",colnames(df2))) {
  df2[j+600] = df2[df2[,j+300] + row(df2)[,j], j]
  }
 })
#   user  system elapsed
 #  8.508   0.000   8.523  colN_OF<- ncol(df3[,grep("OF",colnames(df3))])
system.time({
 indx1<- unlist(df3[,grep("OF",colnames(df3))],use.names=FALSE)
 indx1[rep(seq(nrow(df3)),colN_OF) %in% 
1995:2000][indx1[rep(seq(nrow(df3)),colN_OF) %in% 1995:2000] - 
rep(6:1,colN_OF)>=0] <-NA
  val1<- unlist(df3[,grep("P",colnames(df3))],use.names=FALSE)
  df3[,602:901]<- val1[indx1+seq_along(indx1)]
  colnames(df3)[602:901]<- colnames(df2)[602:901]
 })
#  user  system elapsed 
#  0.568   0.000   0.569   identical(df2,df3)
#[1] TRUE A.K. ----- Original Message -----
From: arun <smartpink...@yahoo.com> To: Ira Sharenow <irasharenow...@yahoo.com> 
Cc: 
Sent: Sunday, September 22, 2013 1:28 AM
Subject: Re: [R] Obtaining data from a different row of data frame Ira, I tried 
with a bigger dataset to look for any errors in the code:
set.seed(29)
 df2<- data.frame(Dates=seq(as.Date("2006-01-03"),length.out=2000,by="1 
day"),cbind(matrix(sample(10:120,2000*300,replace=TRUE),ncol=300),matrix(sample(0:6,2000*300,replace=TRUE),ncol=300)))
 colnames(df2)[2:301]<- paste0("P",1:300)
 colnames(df2)[302:601]<- paste0("OF",1:300)
 df3<- df2 df2[,602:901]<-NA
 colnames(df2)[602:901]<- paste0("newPrice",1:300)
 system.time({
 for(j in grep("^P",colnames(df2))) {
  df2[j+600] = df2[df2[,j+300] + row(df2)[,j], j]
  }
 })
#   user  system elapsed 
 # 9.584   0.000   9.601  vec1<- 6:1 ##change values according to the range of 
actual values in your rows.
 vec2<- 1995:2000 ##change accordingly. If the maximum value is say 100, take 
100 rows from the tail end.  Change the vec1 also so that both are of the same 
length system.time({
 df3[vec2,grep("OF",colnames(df3))]<- t(sapply(seq_along(vec1),function(i) 
{x1<-as.matrix(df3[vec2[i],grep("OF",colnames(df3))]); x1[x1>=vec1[i]]<-NA; 
x1}))
 indx1<- unlist(df3[,grep("OF",colnames(df3))],use.names=FALSE)
 val1<- unlist(df3[,grep("P",colnames(df3))],use.names=FALSE)
  df3[,602:901]<- val1[indx1+seq_along(indx1)]
  colnames(df3)[602:901]<- colnames(df2)[602:901]
 })
#   user  system elapsed 
 # 0.552   0.000   0.553  identical(df2[,602:901],df3[,602:901])
#[1] TRUE A.K.       

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Obtaining data from a different row of data frame

Reply via email to