I am running R 2.1.1 in a Microsoft Windows XP environment.
   
  I have a matrix with three vectors (“columns”) and ~2 million “rows”.  The 
three vectors are date_, id, and price.  The data is ordered (sorted) by code 
and date_.
   
  (The matrix contains daily prices for several thousand stocks, and has ~2 
million “rows”. If a stock did not trade on a particular date, its price is set 
to “NA”)
   
  I wish to add a fourth vector that is “next_price”. (“Next price” is the 
current price as long as the current price is not “NA”.  If the current price 
is NA, the “next_price” is the next price that the security with this same ID 
trades.  If the stock does not trade again,  “next_price” is set to NA.)
   
  I wrote the following loop to calculate next_price.  It works as intended, 
but I have one problem.  When I have only 10,000 rows of data, the calculations 
are very fast.  However, when I run the loop on the full 2 million rows, it 
seems to take ~ 1 second per row.
   
  Why is this happening?  What can I do to speed the calculations when running 
the loop on the full 2 million rows?
   
  (I am not running low on memory, but I am maxing out my CPU at 100%)
   
  Here is my code and some sample data:
   
  data<- data[order(data$code,data$date_),] 
  l<-dim(data)[1]
  w<-3
  data[l,w+1]<-NA
   
  for (i in (l-1):(1)){
  
data[i,w+1]<-ifelse(is.na(data[i,w])==F,data[i,w],ifelse(data[i,2]==data[i+1,2],data[i+1,w+1],NA))
  }
   
   
  date      id         price     next_price
  6/24/2005        1635    444.7838         444.7838
  6/27/2005        1635    448.4756         448.4756
  6/28/2005        1635    455.4161         455.4161
  6/29/2005        1635    454.6658         454.6658
  6/30/2005        1635    453.9155         453.9155
  7/1/2005          1635    453.3153         453.3153
  7/4/2005          1635    NA      453.9155
  7/5/2005          1635    453.9155         453.9155
  7/6/2005          1635    453.0152         453.0152
  7/7/2005          1635    452.8651         452.8651
  7/8/2005          1635    456.0163         456.0163
  12/19/2005      1635    442.6982         442.6982
  12/20/2005      1635    446.5159         446.5159
  12/21/2005      1635    452.4714         452.4714
  12/22/2005      1635    451.074           451.074
  12/23/2005      1635    454.6453         454.6453
  12/27/2005      1635    NA      NA
  12/28/2005      1635    NA      NA
  12/1/2003        1881    66.1562           66.1562
  12/2/2003        1881    64.9192           64.9192
  12/3/2003        1881    66.0078           66.0078
  12/4/2003        1881    65.8098           65.8098
  12/5/2003        1881    64.1275           64.1275
  12/8/2003        1881    64.8697           64.8697
  12/9/2003        1881    63.5337           63.5337
  12/10/2003      1881    62.9399           62.9399

                
---------------------------------

        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to