Dear Patrick,

thanks for the very helpful response. I can calculate now 25 times  
faster.

I use the 'k' from the outer-most loop only indirectly. It gives a  
maximal number of repetitions of the whole script until following  
command applies

'if(length(val.x.c)>=val.x.c.n)break'.

The reason why I use this 'break' instead of 'for(k in 1:val.x.c.n){'  
command is that in some other application of this algorithm more than  
one sample can be chosen in one round.

Is there another/faster way to avoid this usage of 'k'?

Regards,

Thomas

On 14 Jan 2009, at 12:52, Patrick Burns wrote:

> You are definitely in Circle 2 of the R Inferno.
> Growing objects is suboptimal, although your
> objects are small so this probably isn't taking
> too much time.
>
> There is no need for the inner-most loop:
>
> sum.dist[i] <- min(euc[rownames(start.b)[i],val] )
>
> Maybe I'm blind, but I don't see where 'k' comes
> in from the outer-most loop.
>
>
> Patrick Burns
> patr...@burns-stat.com
> +44 (0)20 8525 0696
> http://www.burns-stat.com
> (home of "The R Inferno" and "A Guide for the Unwilling S User")
>
>
> Thomas Terhoeven-Urselmans wrote:
>> Dear R-programmer,
>>
>> I wrote an adapted implementation of the Kennard-Stone algorithm  
>> for  sample selection of multivariate data (R 2.7.1 under MacBook  
>> Pro,  Processor 2.2 GHz Intel Core 2 Duo, Memory 2 GB 667 MHZ DDR2  
>> SDRAM).
>> I used for the heart of the script three embedded loops. This makes  
>> it  especially for huge datasets very slow. For a datamatrix of  
>> 1853*1853  and the selection of 556 samples needed computation time  
>> of more than  24 hours.
>> I did some research on vecotrization, but I could not figure out  
>> how  to do it better/faster. Which ways are there to replace the  
>> time  consuming loops?
>>
>> Here are some information:
>>
>> # val.n<-24;
>> # start.b<-matrix(nrow=1812, ncol=20);
>> # val is a vector of the rownames of 22 in an earlier step chosen   
>> extrem samples;
>> # euc<-<-matrix(nrow=1853, ncol=1853); [contains the Euclidean   
>> distance calculations]
>>
>> The following calculation of the system.time was for the selection  
>> of  two samples:
>> system.time(KEN.STO(val.n,start.b,val.start,euc))
>>    user  system elapsed
>>  25.294  13.262  38.927
>>
>> The function:
>>
>> KEN.STO<-function(val.n,start.b,val,euc){
>>
>> for(k in 1:val.n){
>> sum.dist<-c();
>> for(i in 1:length(start.b[,1])){
>>      sum<-c();
>>      for(j in 1:length(val)){
>>              sum[j]<-euc[rownames(start.b)[i],val[j]]
>>              }
>>              sum.dist[i]<-min(sum);
>>      }
>> bla<-rownames(start.b)[which(sum.dist==max(sum.dist))]
>> val<-c(val,bla[1]);
>> start.b<-start.b[-(which(match(rownames(start.b),val[length(val)])!  
>> ="NA")),];
>> if(length(val)>=val.n)break;
>> }
>> return(val);
>> }
>>
>> Regards,
>>
>> Thomas
>>
>> Dr. Thomas Terhoeven-Urselmans
>> Post-Doc Fellow
>> Soil infrared spectroscopy
>> World Agroforestry Center (ICRAF)    [[alternative HTML version  
>> deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>



Regards,

Thomas

Dr. Thomas Terhoeven-Urselmans
Post-Doc Fellow
Soil infrared spectroscopy
World Agroforestry Center (ICRAF)
United Nations Avenue, Gigiri
PO Box 30677-00100 Nairobi, Kenya
Ph: 254 20 722 4113 or via USA 1 650 833 6654 ext. 4113
Fax 254 20 722 4001 or via USA 1 650 833 6646
Email: t.urselm...@cgiar.org
Internet: http://worldagroforestrycentre.org







        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to