Hi

 

I have a problem with getting stuck in the number of rows that the first
data group has when looping.

 

Let me explain the program:

I want to run randomForest on 200 stocks and get scores of each of them.
First I shall point at the training data set file(with data from 200 stocks)
then I shall point at the predict dataset(with 200 rows of data from the 200
stocks with unknown target).

At the end I shall point at a place to save the file with the score for each
stock.

 

This works almost. The problem is that the number of rows of data from the
stocks in the training file differs on every stock. One time it is 50 rows
for one stock and for another stock it can be 100 rows of data. When I run
this code the number of rows from the first stock is used on all stocks. For
example if stock 1 has 50 rows the calculation on stock2 also will use 50
rows. 

 

So the score in the result file differs from if I do the calculations one
stock at the time.

 

What am I doing wrong?

 

Kind regards

Rolf

   

 

The data in the files looks like this:

Train file:


STOCK.NAME

Indicator1

Indicator2

Indicator3

Indicator4

Indicator5

Indicator6

Indicator7

Indicator8

Indicator9

Indicator10

Action


Stock.1

0.53464

0.809136

0.090641

0.212288

0.817402

0.976926

0.383471

0.119862

0.369533

0.374066

Buy


Stock.1

0.907586

0.421417

0.292742

0.78914

0.263374

0.597003

0.420898

0.582622

0.666901

0.71218

Notbuy


Stock.1

0.682471

0.501301

0.160167

0.753329

0.426113

0.874266

0.752404

0.535917

0.26929

0.30212

Notbuy


Stock.1

0.156847

0.057765

0.345092

0.148373

0.79769

0.927548

0.797175

0.4544

0.135831

0.767282

Buy


Stock.2

0.177951

0.506193

0.075647

0.719628

0.52613

0.131471

0.140883

0.926419

0.393547

0.292262

Notbuy


Stock.2

0.525604

0.152735

0.033175

0.780946

0.037649

0.733622

0.128549

0.763801

0.493194

0.008631

Buy

Predict file:


STOCK.NAME

Indicator1

Indicator2

Indicator3

Indicator4

Indicator5

Indicator6

Indicator7

Indicator8

Indicator9

Indicator10

Action


Stock.1

0.53464

0.809136

0.090641

0.212288

0.817402

0.976926

0.383471

0.119862

0.369533

0.374066

        

Stock.2

0.907586

0.421417

0.292742

0.78914

0.263374

0.597003

0.420898

0.582622

0.666901

0.71218

        
        

        

        

        


 

 

rm(list=ls())

 

require(randomForest, quietly=TRUE)

 

#Reading the TRAINING data...

dtot=read.csv(choose.files(caption="Choose the TRAINING data..."))

 

#Reading the NEW data...

newtot=read.csv(choose.files(caption="Choose the NEW data..."))

 

stk=names(table(dtot[,1]))

 

date=paste(

strsplit(as.character(Sys.Date()),"-")[[1]][1],

strsplit(as.character(Sys.Date()),"-")[[1]][2],

strsplit(as.character(Sys.Date()),"-")[[1]][3],

sep="")

 

res=matrix(0,length(stk),2)

 

for (i in 1:length(stk))

{

 

d=dtot[which(dtot[,1]==stk[i]),-1]

#write.csv(d,paste(stk[i],".csv",sep=""),row.names=F)

#write.csv(newtot[i,-c(1,12)],paste(stk[i],"_PRED.csv",sep=""),row.names=F)

#}

 

# Build the Random Forest model.

 

set.seed(42)

rf <- randomForest(Action ~ .,

      data=d, 

      ntree=500,

      mtry=3,

      importance=TRUE,

      na.action=na.roughfix,

      replace=FALSE)

 

test=rbind(dtot[nrow(dtot),-c(1,12)],newtot[i,-c(1,12)])

 

# Obtain probability scores for the Random Forest model

res[i,2]=predict(rf, test, type="prob")[2,2]

 

 

}

 

res=data.frame(res)

names(res)=c("Stockname","Score")

res[,1]=stk

 

# Output the combined data.

 

setwd(choose.dir(caption="Choose the FOLDER where you want the send the
results..."))

write.csv(res, file=paste("Stock1_newdata_score_all_",date,".csv",sep=""),
row.names=FALSE)

 


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to