Dear R-users,
I am looking for a solution to "parallelize" my PLSR predictions in order to 
save processing time. I was trying to use the "foreach" construct with "doPar" 
(cf. 2nd part of code below), but I was unable to allocate the predicted values 
and the model performance parameters (RMSEP) to the output variable (all in the 
2nd part).
My code:
set.seed(10000)   # generate some data...
mat <- replicate(100, rnorm(100))
y <- as.matrix(mat[,1], drop=F)
x <- mat[,2:100]
eD <- dist(x, method = "euclidean")  # distance matrix to find close samples
eDm <- as.matrix(eD)
kns <- matrix(NA,nrow(x),10)  # empty matrix to allocate 10 closest samples
for (i in 1:nrow(eDm)) {   # identify closest samples in a loop and allocate to 
kns kns[i,] <- head(order(eDm[,i]), 11)[-1]
} 
So far I consider the code as "safe", but the next part is challenging me, 
since I never used the "foreach" construct before:
library(pls) library(foreach) library(doParallel) cl <- makeCluster(2) 
registerDoParallel(cl) out <- foreach(j = 1:nrow(mat), .combine="rbind", 
.packages="pls") %dopar% { pls <- plsr(y ~ x, ncomp=5, validation="CV", , 
subset=kns[j,]) predict(pls, ncomp=5, newdata=x[j,,drop=F]) RMSEP(pls, 
estimate="CV")$val[1,1,5] } stopCluster(cl) 
As I understand, the 3rd-to-last code line starting with "RMSEP(pls,..." is 
simply overwriting the previously written data from the "predict" code line. 
Somehow I was assuming theĀ 
.combineĀ option would take care of this?
Many thanks for your help!
Best, Chega
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to