Re: [R] Any better way of optimizing time for calculating distances in the mentioned scenario??

2012-10-12 Thread Stefan Evert

On 12 Oct 2012, at 09:46, Purna chander wrote:

> 4) scenario4:
>> x<-read.table("query.vec")
>> v<-read.table("query.vec2")
>> v<-as.matrix(v)
>> d<-dist(rbind(v,x),method="manhattan")
>> m<-as.matrix(d)
>> m2<-m[1:nrow(v),(nrow(v)+1):nrow(x)]
>> print(m2[1,1:10])
> 
> time taken for running the code:
> real0m0.445s
> user0m0.401s
> sys 0m0.041s
> 1) Though scenario 4 is optimum, this scenario failed when matrix 'v'
> having more no. of rows. An error occurred while converting distance
> object 'd' to a matrix 'm'.
> For E.g: > m<-as.matrix(d)
>   the above command resulted in error: "Error: cannot allocate
> vector of size 922.7 MB".

That's because you're calculating a full distance matrix with (1+100) * 
(1+100) points and then extract the much smaller number of distance values 
(1 * 100) that you actually need.

I have a use case with similar requirements, so ...

> 3) Any other ideas to optimize the problem i'm facing with.

... my experimental "wordspace" package includes a function dist.matrix() for 
calculating such cross-distance matrices.  The function is written in C code 
and doesn't handle NA's and NaN's properly, but it's considerably faster than 
the current implementation of dist().

I haven't uploaded the package to CRAN yet, but you should be able to install 
with
 
install.packages("wordspace", repos="http://R-Forge.R-project.org";)

Best,
Stefan


PS: Glad to see that daily builds on R-Forge work again -- that's an extremely 
useful feature to get beta testers for experimental package versions. :-)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Any better way of optimizing time for calculating distances in the mentioned scenario??

2012-10-12 Thread Purna chander
Dear All,

I'm dealing with a case, where 'manhattan' distance of each of 100
vectors is calculated from 1 other vectors. For achieving this,
following 4 scenarios are tested:

1) scenario 1:
> x<-read.table("query.vec")
> v<-read.table("query.vec2")

> d<-matrix(nrow=nrow(v),ncol=nrow(x))
> for (i in 1:nrow(v)){
  + d[i,]<- 
sapply(1:nrow(x),function(z){dist(rbind(v[i,],x[z,]),method="manhattan")})
  + }
> print(d[1,1:10])

time taken for running the code is :
real1m33.088s
user1m32.287s
sys 0m0.036s

2) scenario2:

> x<-read.table("query.vec")
> v<-read.table("query.vec2")
> v<-as.matrix(v)
> d<-matrix(nrow=nrow(v),ncol=nrow(x))
> for (i in 1:nrow(v)){
   + tmp_m<-matrix(rep(v[i,],nrow(x)),nrow=nrow(x),byrow=T)
   + d[i,]<- rowSums(abs(tmp_m - x))
   + }
> print(d[1,1:10])

time taken for running the code is:
real0m0.882s
user0m0.854s
sys 0m0.025s

3) scenario3:

> x<-read.table("query.vec")
> v<-read.table("query.vec2")
> v<-as.matrix(v)
> d<-matrix(nrow=nrow(v),ncol=nrow(x))
> for (i in 1:nrow(v)){
  + 
d[i,]<-sapply(1:nrow(x),function(z){dist(rbind(v[i,],x[z,]),method="manhattan")})
  + }
> print(d[1,1:10])

time taken for running the code is:
real1m3.817s
user1m3.543s
sys 0m0.031s

4) scenario4:
> x<-read.table("query.vec")
> v<-read.table("query.vec2")
> v<-as.matrix(v)
> d<-dist(rbind(v,x),method="manhattan")
> m<-as.matrix(d)
> m2<-m[1:nrow(v),(nrow(v)+1):nrow(x)]
> print(m2[1,1:10])

time taken for running the code:
real0m0.445s
user0m0.401s
sys 0m0.041s


Queries:
1) Though scenario 4 is optimum, this scenario failed when matrix 'v'
having more no. of rows. An error occurred while converting distance
object 'd' to a matrix 'm'.
 For E.g: > m<-as.matrix(d)
   the above command resulted in error: "Error: cannot allocate
vector of size 922.7 MB".

So, what can be done to convert a larger dist object into a matrix or
how allocation size can be increased?

2) Here I observed that 'dist()' function calculates the distances
across all vectors present in a given matrix or dataframe. Is it not
possible to calculate distances of specific vectors from other vectors
present in a matrix using 'dist()' function? Which means, suppose if a
matrix 'x' having 20 rows, is it not possible using 'dist()' to
calculate only distance of 1st row vector from other 19 vectors.

3) Any other ideas to optimize the problem i'm facing with.

Regards,
Purnachander

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Any better way of optimizing time for calculating distances in the mentioned scenario??

2012-10-08 Thread Purna chander
Dear All,

I'm dealing with a case, where 'manhattan' distance of each of 100
vectors is calculated from 1 other vectors. For achieving this,
following 4 scenarios are tested:

1) scenario 1:
> x<-read.table("query.vec")
> v<-read.table("query.vec2")

> d<-matrix(nrow=nrow(v),ncol=nrow(x))
> for (i in 1:nrow(v)){
  + d[i,]<- 
sapply(1:nrow(x),function(z){dist(rbind(v[i,],x[z,]),method="manhattan")})
  + }
> print(d[1,1:10])

time taken for running the code is :
real1m33.088s
user1m32.287s
sys 0m0.036s

2) scenario2:

> x<-read.table("query.vec")
> v<-read.table("query.vec2")
> v<-as.matrix(v)
> d<-matrix(nrow=nrow(v),ncol=nrow(x))
> for (i in 1:nrow(v)){
   + tmp_m<-matrix(rep(v[i,],nrow(x)),nrow=nrow(x),byrow=T)
   + d[i,]<- rowSums(abs(tmp_m - x))
   + }
> print(d[1,1:10])

time taken for running the code is:
real0m0.882s
user0m0.854s
sys 0m0.025s

3) scenario3:

> x<-read.table("query.vec")
> v<-read.table("query.vec2")
> v<-as.matrix(v)
> d<-matrix(nrow=nrow(v),ncol=nrow(x))
> for (i in 1:nrow(v)){
  + 
d[i,]<-sapply(1:nrow(x),function(z){dist(rbind(v[i,],x[z,]),method="manhattan")})
  + }
> print(d[1,1:10])

time taken for running the code is:
real1m3.817s
user1m3.543s
sys 0m0.031s

4) scenario4:
> x<-read.table("query.vec")
> v<-read.table("query.vec2")
> v<-as.matrix(v)
> d<-dist(rbind(v,x),method="manhattan")
> m<-as.matrix(d)
> m2<-m[1:nrow(v),(nrow(v)+1):nrow(x)]
> print(m2[1,1:10])

time taken for running the code:
real0m0.445s
user0m0.401s
sys 0m0.041s


Queries:
1) Though scenario 4 is optimum, this scenario failed when matrix 'v'
having more no. of rows. An error occurred while converting distance
object 'd' to a matrix 'm'.
 For E.g: > m<-as.matrix(d)
   the above command resulted in error: "Error: cannot allocate
vector of size 922.7 MB".

So, what can be done to convert a larger dist object into a matrix or
how allocation size can be increased?

2) Here I observed that 'dist()' function calculates the distances
across all vectors present in a given matrix or dataframe. Is it not
possible to calculate distances of specific vectors from other vectors
present in a matrix using 'dist()' function? Which means, suppose if a
matrix 'x' having 20 rows, is it not possible using 'dist()' to
calculate only distance of 1st row vector from other 19 vectors.

3) Any other ideas to optimize the problem i'm facing with.

Regards,
Purnachander

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.