Re: [R] how to replace my double for loop which is little efficient!

2010-12-27 Thread bbslover

Thank Berend,

It seems like that it is better to attach a PDF file for avoiding messy
code.

Yes, I want to obtain is Tanimoto coefficient and your web site "wikipedia"
is about this coefficient. I also search R site about tanimoto coefficient
and learn it more. 

About your code, I has saved and learned it. 

Thanks again


Kevin


-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164920.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to replace my double for loop which is little efficient!

2010-12-27 Thread Berend Hasselman



Found this:
http://en.wikipedia.org/wiki/Jaccard_index#Tanimoto_coefficient_.28extended_Jaccard_coefficient.29

and an R site search with "tanimoto" yielded some more interesting stuff.

The rest is up to you.

Berend
-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164785.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to replace my double for loop which is little efficient!

2010-12-27 Thread Berend Hasselman


bbslover wrote:
> 
> thanks for your help. I am sorry I do not full understand your code, so i
> can not correct using your code to my data. here is the attachment of my
> data, and what I want to compute is the equation in the word document of
> the attachment:
> 
> the code form Berend can get the answer i want to get.
> 

I've seen what you want to do.
OpenOffice mangled the equations so I used an online conversion to pdf.
Thanks.

See my previous post for the fastest version. It's just converting your
formulas with some matrix algebra.

Berend
-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164767.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to replace my double for loop which is little efficient!

2010-12-26 Thread Berend Hasselman


djmuseR wrote:
> 
> On Sun, Dec 26, 2010 at 4:18 AM, bbslover  wrote:
> 
>>
>> x: is a matrix  202*263,  that is 202 samples, and 263 independent
>> variables
>>
>> num.compd<-nrow(x); # number of compounds
>> diss.all<-0
>> for( i in 1:num.compd)
>>   for (j in 1:num.compd)
>>  if (i!=j) {
>>
> 
> Isn't this just X'X?
> 
>>S1<-sum(x[i,]*x[j,])
>>
> Aren't each of S2 and S3 just diag(X'X)?
> 
>>S2<-sum(x[i,]^2)
>>
>S3<-sum(x[j,]^2)
>>sim2<-S1/(S2+S3-S1)
>>diss2<-1-sim2
>>diss.all<-diss.all+diss2}
>>
> 
> I tried
> s1 <- crossprod(x)
> s2 <- diag(s1)
> s3 <-outer(s2, s2, '+') - s1
> s1/s3
> 
> This yields a symmetric matrix with 1's along the diagonal and quantities
> between 0 and 1 in the off-diagonal. Something like it could conceivably
> be
> used as a similarity matrix. Is that what you're looking for with sim2?
> 
> I agree with Berend: it looks like a problem that could be easily solved
> with some matrix algebra. R can do matrix algebra quite efficiently,
> y'know...
> 
> (BTW, I tried this on a 1000 x 1000 input matrix:
> system.time(myfunc(x))
>user  system elapsed
>0.990.021.02
> 
> I expect it could be improved by an order of magnitude if one actually
> knew
> what you were computing... )
> 

I did some more work along Dennis' lines

xtx <- tcrossprod(x)
xtd <- diag(xtx)
xzz <- outer(xtd,xtd,'+')
zz  <- 1 - xtx/(xzz-xtx)
diss.all <- sum(zz)

this appears to give the desired result and it's quite a bit faster than my
alternative 2.
It would indeed be nice to know what is being computed.

Berend
-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164755.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to replace my double for loop which is little efficient!

2010-12-26 Thread bbslover

thanks for your help. I am sorry I do not full understand your code, so i can
not correct using your code to my data. here is the attachment of my data,
and what I want to compute is the equation in the word document of the
attachment:

the code form Berend can get the answer i want to get.

http://r.789695.n4.nabble.com/file/n3164741/my_data.rar my_data.rar 


-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164741.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to replace my double for loop which is little efficient!

2010-12-26 Thread bbslover

thanks for your help, it is great. In addition, In the beginning, the format
of x is dataframe, and i run my code, it is so slow, after your help, I
change x for matirx, it is so quick. I am very grateful your kind help, and
your code is so good!

kevin
-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164732.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to replace my double for loop which is little efficient!

2010-12-26 Thread Dennis Murphy
Hi:


On Sun, Dec 26, 2010 at 4:18 AM, bbslover  wrote:

>
> Dear all,
>
> My double for loop as follows, but it is little efficient, I hope all
> friends can give me a "vectorized" program to replace my code. thanks
>
> x: is a matrix  202*263,  that is 202 samples, and 263 independent
> variables
>
> num.compd<-nrow(x); # number of compounds
> diss.all<-0
> for( i in 1:num.compd)
>   for (j in 1:num.compd)
>  if (i!=j) {
>

Isn't this just X'X?

>S1<-sum(x[i,]*x[j,])
>
Aren't each of S2 and S3 just diag(X'X)?

>S2<-sum(x[i,]^2)
>
   S3<-sum(x[j,]^2)
>sim2<-S1/(S2+S3-S1)
>diss2<-1-sim2
>diss.all<-diss.all+diss2}
>

I tried
s1 <- crossprod(x)
s2 <- diag(s1)
s3 <-outer(s2, s2, '+') - s1
s1/s3

This yields a symmetric matrix with 1's along the diagonal and quantities
between 0 and 1 in the off-diagonal. Something like it could conceivably be
used as a similarity matrix. Is that what you're looking for with sim2?

I agree with Berend: it looks like a problem that could be easily solved
with some matrix algebra. R can do matrix algebra quite efficiently,
y'know...

(BTW, I tried this on a 1000 x 1000 input matrix:
system.time(myfunc(x))
   user  system elapsed
   0.990.021.02

I expect it could be improved by an order of magnitude if one actually knew
what you were computing... )

HTH,
Dennis

it will cost a long time to finish this computation! i really need "rapid"
> code to replace my code.
>
> thanks
>
> kevin
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164222.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to replace my double for loop which is little efficient!

2010-12-26 Thread Berend Hasselman


bbslover wrote:
> 
> x: is a matrix  202*263,  that is 202 samples, and 263 independent
> variables
> 
> num.compd<-nrow(x); # number of compounds
> diss.all<-0
> for( i in 1:num.compd)
>for (j in 1:num.compd)
>   if (i!=j) {
> S1<-sum(x[i,]*x[j,])
> S2<-sum(x[i,]^2)
> S3<-sum(x[j,]^2)
> sim2<-S1/(S2+S3-S1)
> diss2<-1-sim2
> diss.all<-diss.all+diss2}
> 
> it will cost a long time to finish this computation! i really need "rapid"
> code to replace my code.
> 

Alternative 1:  j-loop only needs to start at i+1 so

for( i in 1:num.compd) {
for (j in seq(from=i+1,to=num.compd,length.out=max(0,num.compd-i))) {
S1<-sum(x[i,]*x[j,])
S2<-sum(x[i,]^2)
S3<-sum(x[j,]^2)
sim2<-S1/(S2+S3-S1)
diss2<-1-sim2
diss2.all<-diss2.all+diss2
}
}
diss2.all <- 2 * diss2.all

On my pc this is about twice as fast as your version (with 202 samples and
263 variables)

Alternative 2: all sum() are not necessary. Use some matrix algebra:

xtx <- x %*% t(x)
diss3.all <- 0
for( i in 1:num.compd) {
for (j in seq(from=i+1,to=num.compd,length.out=max(0,num.compd-i))) {
S1 <- xtx[i,j]
S2 <- xtx[i,i]
S3 <- xtx[j,j]
sim2<-S1/(S2+S3-S1)
diss2<-1-sim2
diss3.all<-diss3.all+diss2
}
}
diss3.all <- 2 * diss3.all

This is about four times as fast as alternative 1.

I'm quite sure that more expert R gurus can get some more speed up.

Note: I generated the x matrix with:
set.seed(1);x<-matrix(runif(202*263),nrow=202)
(Timings on iMac 2.16Ghz and using 64-bit R)

Berend

-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164262.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.