Carolina--
You did not specify your platform and sessionInfo(), but I suspect that
specific error messages is because your distance matrix is too long of a
vector for R 2.15.2, which limits vectors to 2^31-1 in length:

> 2^31-1
[1] 2147483647
> 138037**2
[1] 19054213369

138037^2 / 2 > 2^31
[1] TRUE

The good news (and why I know about this) is that R 3.0 due out in April
will allow longer vectors on 64-bit platforms.
http://developer.r-project.org/30update.txt
http://developer.r-project.org/

However, unless your computer is much much faster than mine, I don't think
that you want to compute 10 billion pairwise dissimilarities, and I'm even
more confident that you don't want to attempt to attempt agglomerative
clustering on such a matrix.  You may need to take repeated 1%, 5%, or 10%
random samples of your pixels, generate your clusterings, and then test if
your results converge across subsets.  [If you are pulling all pixels in a
grid, you can do regular rather than random sampling to get your subsets.]

I hope that this helps, and I apologize for suggesting that you rethink how
you approach your problem.

Tom 2



On Mon, Feb 11, 2013 at 2:15 PM, Carolina Bello <caro.bell...@gmail.com>wrote:

> Hi
> I have some problems with the vegdist function.I want to do a hierarchical
> cluster from 138037 pixels of 1 lkm^2 from a study area of colombian Andes.
> I have distributions models for 89 species so i have a matrix with the
> pixels in the rows and species in the columns and is full with
> absence(0)/presence(1) of each species per each pixel. I think the bigger
> problem is that for agglomeration method in the hierarchical cluster i need
> the hole matrix so i can´t divided it.
>
> For doing this I want to calculate a
> distance matrix with jaccard. I have binary data.
>
> The problem is that i have a matrix of 138037 rows (sites) and 89 columns
> (species). my script is:
>
>     rm(list=ls(all=T))
>
>     gc() ##para borrar todo lo que quede oculto en memoria
>
>     memory.limit(size = 100000) # it gives 1 Tera from HDD in case ram
> memory is over
>
>     DF=as.data.frame(MODELOS)
>
>     DF=na.omit(DF)
>
>     DISTAN=vegdist(DF[,2:ncol(DF)],"jaccard")
>
> Almost immediately IT produces the error:* Error en double(N * (N - 1)/2) :
> tamaño del vector especificado es muy grande*
>
> I think this a memory error, but i don´t know why if i have a pc with 32GB
> of ram and 1 Tera of HDD.
>
> I also try to do a dist matrix whit the function dist from package proxy, i
> did:
>
>   library(proxy)
>
>     vector=dist(DF, method = "Jaccard")
>
> it starts to run but when it gets to 10 GB of ram, a window announces that
> R
> committed an error and it will close, so it closes and start a new section.
>
> I really don't know what is going on and less how to solve this, can
> anybody
> help me?
>
> thanks
>
> Carolina Bello IAVH-COLOMBIA
>
>
>
>
> --
> View this message in context:
>
> http://r.789695.n4.nabble.com/vegdist-Error-en-double-N-N-1-2-tama-o-del-vector-especificado-es-muy-grande-tp4658010.html
> Sent from the R help mailing list archive at Nabble.com.
>
>         [[alternative HTML version deleted]]
>
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
>


-- 
-------------------------------------------
Tom Philippi, Ph.D.
Quantitative Ecologist & Data Therapist
Inventory and Monitoring Program
National Park Service
(619) 523-4576
tom_phili...@nps.gov
http://science.nature.nps.gov/im/monitor

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Reply via email to