On 20.06.2013 16:46, David martin wrote:
Hi , When using errorest on a large dataset (12000 variables) it performs very slow. By looking at the randomforest package it says that for largedatasets the use of the formula is discouraged. So it's better to use the x and y terms as the example below: rf<-randomForest(x=df[trainindices,-1],y=df[trainindices,1],xtest=df[testindices,-1],ytest=df[testindices,1], do.trace=5, ntree=500) Would it be possible to modify errorest so that it uses x and y rather than formula. I think that would increase speed on large datasets. errorest(type~.,data=mydate, model=randomForest,mtry=2)#will perform slow errorest(x=type,y=variables,data=mydate, model=randomForest,mtry=2)#would perform faster if implemented
Talk to the maintainer of the package you found errorest() in? Best, Uwe Ligges
thanks, david ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.