Dear Patrick, thanks for the very helpful response. I can calculate now 25 times faster.
I use the 'k' from the outer-most loop only indirectly. It gives a maximal number of repetitions of the whole script until following command applies 'if(length(val.x.c)>=val.x.c.n)break'. The reason why I use this 'break' instead of 'for(k in 1:val.x.c.n){' command is that in some other application of this algorithm more than one sample can be chosen in one round. Is there another/faster way to avoid this usage of 'k'? Regards, Thomas On 14 Jan 2009, at 12:52, Patrick Burns wrote: > You are definitely in Circle 2 of the R Inferno. > Growing objects is suboptimal, although your > objects are small so this probably isn't taking > too much time. > > There is no need for the inner-most loop: > > sum.dist[i] <- min(euc[rownames(start.b)[i],val] ) > > Maybe I'm blind, but I don't see where 'k' comes > in from the outer-most loop. > > > Patrick Burns > patr...@burns-stat.com > +44 (0)20 8525 0696 > http://www.burns-stat.com > (home of "The R Inferno" and "A Guide for the Unwilling S User") > > > Thomas Terhoeven-Urselmans wrote: >> Dear R-programmer, >> >> I wrote an adapted implementation of the Kennard-Stone algorithm >> for sample selection of multivariate data (R 2.7.1 under MacBook >> Pro, Processor 2.2 GHz Intel Core 2 Duo, Memory 2 GB 667 MHZ DDR2 >> SDRAM). >> I used for the heart of the script three embedded loops. This makes >> it especially for huge datasets very slow. For a datamatrix of >> 1853*1853 and the selection of 556 samples needed computation time >> of more than 24 hours. >> I did some research on vecotrization, but I could not figure out >> how to do it better/faster. Which ways are there to replace the >> time consuming loops? >> >> Here are some information: >> >> # val.n<-24; >> # start.b<-matrix(nrow=1812, ncol=20); >> # val is a vector of the rownames of 22 in an earlier step chosen >> extrem samples; >> # euc<-<-matrix(nrow=1853, ncol=1853); [contains the Euclidean >> distance calculations] >> >> The following calculation of the system.time was for the selection >> of two samples: >> system.time(KEN.STO(val.n,start.b,val.start,euc)) >> user system elapsed >> 25.294 13.262 38.927 >> >> The function: >> >> KEN.STO<-function(val.n,start.b,val,euc){ >> >> for(k in 1:val.n){ >> sum.dist<-c(); >> for(i in 1:length(start.b[,1])){ >> sum<-c(); >> for(j in 1:length(val)){ >> sum[j]<-euc[rownames(start.b)[i],val[j]] >> } >> sum.dist[i]<-min(sum); >> } >> bla<-rownames(start.b)[which(sum.dist==max(sum.dist))] >> val<-c(val,bla[1]); >> start.b<-start.b[-(which(match(rownames(start.b),val[length(val)])! >> ="NA")),]; >> if(length(val)>=val.n)break; >> } >> return(val); >> } >> >> Regards, >> >> Thomas >> >> Dr. Thomas Terhoeven-Urselmans >> Post-Doc Fellow >> Soil infrared spectroscopy >> World Agroforestry Center (ICRAF) [[alternative HTML version >> deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> >> Regards, Thomas Dr. Thomas Terhoeven-Urselmans Post-Doc Fellow Soil infrared spectroscopy World Agroforestry Center (ICRAF) United Nations Avenue, Gigiri PO Box 30677-00100 Nairobi, Kenya Ph: 254 20 722 4113 or via USA 1 650 833 6654 ext. 4113 Fax 254 20 722 4001 or via USA 1 650 833 6646 Email: t.urselm...@cgiar.org Internet: http://worldagroforestrycentre.org [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.