Dear R-programmer,

I wrote an adapted implementation of the Kennard-Stone algorithm for  
sample selection of multivariate data (R 2.7.1 under MacBook Pro,  
Processor 2.2 GHz Intel Core 2 Duo, Memory 2 GB 667 MHZ DDR2 SDRAM).
I used for the heart of the script three embedded loops. This makes it  
especially for huge datasets very slow. For a datamatrix of 1853*1853  
and the selection of 556 samples needed computation time of more than  
24 hours.
I did some research on vecotrization, but I could not figure out how  
to do it better/faster. Which ways are there to replace the time  
consuming loops?

Here are some information:

# val.n<-24;
# start.b<-matrix(nrow=1812, ncol=20);
# val is a vector of the rownames of 22 in an earlier step chosen  
extrem samples;
# euc<-<-matrix(nrow=1853, ncol=1853); [contains the Euclidean  
distance calculations]

The following calculation of the system.time was for the selection of  
two samples:
system.time(KEN.STO(val.n,start.b,val.start,euc))
    user  system elapsed
  25.294  13.262  38.927

The function:

KEN.STO<-function(val.n,start.b,val,euc){

for(k in 1:val.n){
sum.dist<-c();
for(i in 1:length(start.b[,1])){
        sum<-c();
        for(j in 1:length(val)){
                sum[j]<-euc[rownames(start.b)[i],val[j]]
                }
                sum.dist[i]<-min(sum);
        }
bla<-rownames(start.b)[which(sum.dist==max(sum.dist))]
val<-c(val,bla[1]);
start.b<-start.b[-(which(match(rownames(start.b),val[length(val)])! 
="NA")),];
if(length(val)>=val.n)break;
}
return(val);
}

Regards,

Thomas

Dr. Thomas Terhoeven-Urselmans
Post-Doc Fellow
Soil infrared spectroscopy
World Agroforestry Center (ICRAF) 
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to