Hi
I have a problem with getting stuck in the number of rows that the first data group has when looping. Let me explain the program: I want to run randomForest on 200 stocks and get scores of each of them. First I shall point at the training data set file(with data from 200 stocks) then I shall point at the predict dataset(with 200 rows of data from the 200 stocks with unknown target). At the end I shall point at a place to save the file with the score for each stock. This works almost. The problem is that the number of rows of data from the stocks in the training file differs on every stock. One time it is 50 rows for one stock and for another stock it can be 100 rows of data. When I run this code the number of rows from the first stock is used on all stocks. For example if stock 1 has 50 rows the calculation on stock2 also will use 50 rows. So the score in the result file differs from if I do the calculations one stock at the time. What am I doing wrong? Kind regards Rolf The data in the files looks like this: Train file: STOCK.NAME Indicator1 Indicator2 Indicator3 Indicator4 Indicator5 Indicator6 Indicator7 Indicator8 Indicator9 Indicator10 Action Stock.1 0.53464 0.809136 0.090641 0.212288 0.817402 0.976926 0.383471 0.119862 0.369533 0.374066 Buy Stock.1 0.907586 0.421417 0.292742 0.78914 0.263374 0.597003 0.420898 0.582622 0.666901 0.71218 Notbuy Stock.1 0.682471 0.501301 0.160167 0.753329 0.426113 0.874266 0.752404 0.535917 0.26929 0.30212 Notbuy Stock.1 0.156847 0.057765 0.345092 0.148373 0.79769 0.927548 0.797175 0.4544 0.135831 0.767282 Buy Stock.2 0.177951 0.506193 0.075647 0.719628 0.52613 0.131471 0.140883 0.926419 0.393547 0.292262 Notbuy Stock.2 0.525604 0.152735 0.033175 0.780946 0.037649 0.733622 0.128549 0.763801 0.493194 0.008631 Buy Predict file: STOCK.NAME Indicator1 Indicator2 Indicator3 Indicator4 Indicator5 Indicator6 Indicator7 Indicator8 Indicator9 Indicator10 Action Stock.1 0.53464 0.809136 0.090641 0.212288 0.817402 0.976926 0.383471 0.119862 0.369533 0.374066 Stock.2 0.907586 0.421417 0.292742 0.78914 0.263374 0.597003 0.420898 0.582622 0.666901 0.71218 rm(list=ls()) require(randomForest, quietly=TRUE) #Reading the TRAINING data... dtot=read.csv(choose.files(caption="Choose the TRAINING data...")) #Reading the NEW data... newtot=read.csv(choose.files(caption="Choose the NEW data...")) stk=names(table(dtot[,1])) date=paste( strsplit(as.character(Sys.Date()),"-")[[1]][1], strsplit(as.character(Sys.Date()),"-")[[1]][2], strsplit(as.character(Sys.Date()),"-")[[1]][3], sep="") res=matrix(0,length(stk),2) for (i in 1:length(stk)) { d=dtot[which(dtot[,1]==stk[i]),-1] #write.csv(d,paste(stk[i],".csv",sep=""),row.names=F) #write.csv(newtot[i,-c(1,12)],paste(stk[i],"_PRED.csv",sep=""),row.names=F) #} # Build the Random Forest model. set.seed(42) rf <- randomForest(Action ~ ., data=d, ntree=500, mtry=3, importance=TRUE, na.action=na.roughfix, replace=FALSE) test=rbind(dtot[nrow(dtot),-c(1,12)],newtot[i,-c(1,12)]) # Obtain probability scores for the Random Forest model res[i,2]=predict(rf, test, type="prob")[2,2] } res=data.frame(res) names(res)=c("Stockname","Score") res[,1]=stk # Output the combined data. setwd(choose.dir(caption="Choose the FOLDER where you want the send the results...")) write.csv(res, file=paste("Stock1_newdata_score_all_",date,".csv",sep=""), row.names=FALSE) [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.