Carolina-- You did not specify your platform and sessionInfo(), but I suspect that specific error messages is because your distance matrix is too long of a vector for R 2.15.2, which limits vectors to 2^31-1 in length:
> 2^31-1 [1] 2147483647 > 138037**2 [1] 19054213369 138037^2 / 2 > 2^31 [1] TRUE The good news (and why I know about this) is that R 3.0 due out in April will allow longer vectors on 64-bit platforms. http://developer.r-project.org/30update.txt http://developer.r-project.org/ However, unless your computer is much much faster than mine, I don't think that you want to compute 10 billion pairwise dissimilarities, and I'm even more confident that you don't want to attempt to attempt agglomerative clustering on such a matrix. You may need to take repeated 1%, 5%, or 10% random samples of your pixels, generate your clusterings, and then test if your results converge across subsets. [If you are pulling all pixels in a grid, you can do regular rather than random sampling to get your subsets.] I hope that this helps, and I apologize for suggesting that you rethink how you approach your problem. Tom 2 On Mon, Feb 11, 2013 at 2:15 PM, Carolina Bello <caro.bell...@gmail.com>wrote: > Hi > I have some problems with the vegdist function.I want to do a hierarchical > cluster from 138037 pixels of 1 lkm^2 from a study area of colombian Andes. > I have distributions models for 89 species so i have a matrix with the > pixels in the rows and species in the columns and is full with > absence(0)/presence(1) of each species per each pixel. I think the bigger > problem is that for agglomeration method in the hierarchical cluster i need > the hole matrix so i can´t divided it. > > For doing this I want to calculate a > distance matrix with jaccard. I have binary data. > > The problem is that i have a matrix of 138037 rows (sites) and 89 columns > (species). my script is: > > rm(list=ls(all=T)) > > gc() ##para borrar todo lo que quede oculto en memoria > > memory.limit(size = 100000) # it gives 1 Tera from HDD in case ram > memory is over > > DF=as.data.frame(MODELOS) > > DF=na.omit(DF) > > DISTAN=vegdist(DF[,2:ncol(DF)],"jaccard") > > Almost immediately IT produces the error:* Error en double(N * (N - 1)/2) : > tamaño del vector especificado es muy grande* > > I think this a memory error, but i don´t know why if i have a pc with 32GB > of ram and 1 Tera of HDD. > > I also try to do a dist matrix whit the function dist from package proxy, i > did: > > library(proxy) > > vector=dist(DF, method = "Jaccard") > > it starts to run but when it gets to 10 GB of ram, a window announces that > R > committed an error and it will close, so it closes and start a new section. > > I really don't know what is going on and less how to solve this, can > anybody > help me? > > thanks > > Carolina Bello IAVH-COLOMBIA > > > > > -- > View this message in context: > > http://r.789695.n4.nabble.com/vegdist-Error-en-double-N-N-1-2-tama-o-del-vector-especificado-es-muy-grande-tp4658010.html > Sent from the R help mailing list archive at Nabble.com. > > [[alternative HTML version deleted]] > > > _______________________________________________ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology > > -- ------------------------------------------- Tom Philippi, Ph.D. Quantitative Ecologist & Data Therapist Inventory and Monitoring Program National Park Service (619) 523-4576 tom_phili...@nps.gov http://science.nature.nps.gov/im/monitor [[alternative HTML version deleted]]
_______________________________________________ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology