Etienne, We used Chi-square statistic to inspect randomness, see the code in summary.permat. To test the convergence of sequential algorithms, you can use time series and MCMC diagnostic tools.
Peter 2010/1/7 Etienne Laliberté <etiennelalibe...@gmail.com>: > Many thanks Jari for your input. > > I'll have a look at this backtracking method and see how I could > implement it. > > Ensuring that the null matrices are indeed random is clearly good > advice, and I'd need to do this, but do you have any suggestions on how > to do this in practice? A copy of the script you've used to test Peter's > null model algorithms would be very useful. > > Thanks > > Etienne > > > Le jeudi 07 janvier 2010 à 22:24 +0200, Jari Oksanen a écrit : >> On 07/01/2010 21:48, "Etienne Laliberté" <etiennelalibe...@gmail.com> wrote: >> >> > Many thanks again Carsten. >> >> Yes, you're right that care must be taken to >> > ensure that a decent number >> of unique random matrices must be obtained. I >> > don't think it would be a >> problem in my case given that transforming my >> > continuous abundance data >> to count by >> >> mat2<- floor(mat * 100 / min(mat[mat > >> > 0]) ) >> >> can quickly lead to a very large number of individuals (hundreds >> > of >> thousands if not > million in some cases). The issue then becomes >> > mostly >> one of computing speed, but that's another story. >> >> Following your >> > suggestion, I came up with a simple (read naïve) R >> algorithm that starts with >> > a binary matrix obtained by commsimulator >> then fills it randomly. It seems to >> > work relatively well, though it's >> extremely slow. >> >> In addition, because the >> > algorithm fills the matrix one individual at a >> time, I sometimes ran into the >> > problem of one individual which had >> nowhere to go because "ressources" at the >> > site (i.e. the row sum) was >> already full. In that case, this individual >> > disappears, i.e. goes >> nowhere. This is a bit drastic and clearly sub-optimal, >> > but it's the >> only way I could quickly think of yesterday to prevent the >> > algorithm >> from getting stuck. The consequence is that often, the total number >> > of >> individuals in the null matrix is a bit less than total number >> > of >> individuals in the observed matrix. >> >> Etienne, >> >> This is the same problem as filling up a binary matrix: you get stuck before >> you can fill in all presences. The solution in vegan::commsimulator was >> "backtracking" method: when you get stuck, you remove some of the items >> (backtrack to an earlier stage of filling) and start again. This is even >> slower than initial filling, but it may be doable. The model application of >> backtracking goes back to Gotelli & Entsminger (Oecologia 129, 282--291; >> 2001) who had this for binary data, but the same idea is natural for >> quantitative null models, too. The vegan vignette discusses some details of >> implementation which are valid also for quantitative filling. >> >> I would like to repeat the serious warning that Carsten Dormann made: you >> better be sure that your generated null models really are random. Péter >> Sólymos developed some quantitative null models for vegan, but we found in >> our tests that they were not random at all, but the constraints we put >> produced peculiar kind of data with regular features although there was >> technically no problem. Therefore we removed code worth of huge amount of >> developing work as dangerous for use. I do think that also the r2dtable >> approach is more regular and less variable (lower variance) than the >> community data, and you very easily get misleadingly significant results >> when you compare your observed community against too regular null models. >> This is something analogous to using strict Poisson error in glm or gam when >> the data in fact are overdispersed to Poisson. Be very careful if you want >> to claim significant results based on quantitative null models. Would I be a >> referee or an editor for this kind of ms, I would be very skeptical ask for >> the proof of the validity of the null model. >> >> Cheers, Jari Oksanen >> >> > I don't see this as being a >> > real >> problem with the large matrices I'll deal with though, but >> > definitely >> something to be aware of. >> >> If you're curious, here's the simple >> > algorithm, as well as some code to >> run a test below. I'd be very interested to >> > compare it to Vazquez's >> algorithm! >> >> Thanks again >> >> Etienne >> >> ### >> >> nullabun <- >> > function(x){ >> require(vegan) >> nsites <- nrow(x) >> nsp <- ncol(x) >> >> > rowsum <- rowSums(x) >> colsum <- colSums(x) >> nullbin <- commsimulator(x, >> > method = "quasiswap") >> rownull <- rowSums(nullbin) >> colnull <- >> > colSums(nullbin) >> rowsum <- rowsum - rownull >> colsum <- colsum - colnull >> >> > ind <- rep(1:nsp, colsum) >> ress <- rep(1:nsites, rowsum) >> total <- >> > sum(rowsum) >> selinds <- sample(1:total) >> for (j in 1:total){ >> >> > indsel <- ind[selinds[j]] >> sitepot <- which(nullbin[, indsel] > 0) >> >> > if (length(ress[ress %in% sitepot]) > 0 ){ >> sitesel <- >> > sample(ress[ress %in% sitepot], 1) >> ress <- ress[-which(ress == >> > sitesel)[1] ] >> nullbin[sitesel, indsel] <- nullbin[sitesel, indsel] >> > + 1 >> } >> } >> return(nullbin) >> } >> >> >> ### now let's test the >> > function >> >> # create a dummy abundance matrix >> m <- >> > matrix(c(1,3,2,0,3,1,0,2,1,0,2,1,0,0,1,2,0,3,0,0,0,1,4,3), 4, >> > 6, >> byrow=TRUE) >> >> # generate a null matrix >> m.null <- nullabun(m) >> >> # compare >> > total number of individuals >> sum(m) #30 >> sum(m.null) #29: one individual got >> > "stuck" with no ressources... >> >> # to generate more than one null matrix, say >> > 999 >> nulls <- replicate(n = 999, nullabun(m), simplify = F) >> >> # how many unique >> > null matrices? >> length(unique(nulls) ) # I found 983 out of 999 >> >> # how many >> > individuals in m? >> sum(m) # there are 30 >> >> # how bad is the problem of >> > individuals getting stuck with no ressources >> in null matrices? >> sums <- >> > as.numeric(lapply(nulls, sum, simplify = T)) >> hist(sums) # the vast majority >> > had 29 or 30 individuals >> >> >> Le jeudi 07 janvier 2010 à 10:34 +0100, Carsten >> > Dormann a écrit : >> > Hi Etienne, >> > >> > I'm afraid that swap.web cannot easily >> > accommodate this constraint. >> > Diego Vazquez has used an alternative approach >> > for this problem, but I >> > haven't seen code for it (it's briefly described in >> > his Oikos 2005 >> > paper). While swap.web starts with a "too-full" matrix and >> > then >> > downsamples, Diego starts by allocating bottom-up. There it should be >> > >> > relatively easy to also constrain the row-constancy of connectance. >> > >> > I >> > can't promise, but I will hopefully have this algorithm by the end >> > of next >> > week (I'm teaching this stuff next week and this is one of the >> > assignments). >> > If so, I'll get it over to you. >> > >> > The problem that already arises with >> > swap.web for very sparse matrices >> > is that there are very few unique >> > combinations left after all these >> > constraints. This will be even worse for >> > your approach, and plant >> > community data are usually VERY sparse. Once you >> > have produced the >> > null models, you should assure yourself that there are >> > hundreds >> > (better thousands) of possible randomised matrices. Adding just >> > one >> > more constraint to your null model (that of column AND row constancy >> > >> > of 0s) will uniquely define the matrix! Referees may not pick it up, >> > but it >> > may give you trivial results. >> > >> > Best wishes, >> > >> > Carsten >> > >> > >> > Vázquez, >> > D. P., Melián, C. J., Williams, N. M., Blüthgen, N., Krasnov, >> > B. R., Poulin, >> > R., et al. (2007). Species abundance and asymmetric >> > interaction strength in >> > ecological networks. Oikos, 116, 1120-1127. >> > >> > On 06/01/2010 23:57, Etienne >> > Laliberté wrote: >> > > Many thanks Carsten and Peter for your suggestions. >> > > >> > >> > > commsimulator indeed respects the two contraints I'm interested in, but> >> > > only allows for binary data. >> > > >> > > swap.web is *almost* what I need, but >> > only overall matrix fill is kept >> > > constant, whereas I want zeros to move >> > only between rows, not between >> > > both columns and rows. In others words, if >> > the initial data matrix had >> > > three zeros for row 1, permuted matrices >> > should also have three zeros >> > > for that row. >> > > >> > > I do not doubt that >> > Peter's suggestions are good, but I'm afraid they >> > > seem a bit overly >> > complicated for my particular problem. All I'm after >> > > is to create n >> > randomly-assembled matrices from an observed species >> > > abundance matrix to >> > compare the observed functional diversity of the >> > > sampling sites to a null >> > expectation. To be conservative, this requires >> > > that I hold species >> > richness constant at each site, and keep row and >> > > column marginals fixed. >> > >> > > >> > > Could swap.web or permatswap(..., method = "quasiswap") be easily >> > > >> > tweaked to accomodate this? The only difference really is that matrix >> > > fill >> > should be kept constant *but also* be constrained within rows. >> > > >> > > Thanks >> > again for your help. >> > > >> > > Etienne >> > > >> > > Le 7 janvier 2010 08:17, Peter >> > Solymos <soly...@ualberta.ca> a écrit : >> > > >> > > > Dear Etienne, >> > > > >> > > >> > > You can try the Chris Hennig's prablus package which have a parametric >> > > > >> > bootstrap based null-model where clumpedness of occurrences or >> > > > >> > abundances (this might allow continuous data, too) is estimated from >> > > > the >> > site-species matrix and used in the null-model generation. But >> > > > here, the >> > sum of the matrix will vary randomly. >> > > > >> > > > But if you have >> > environmental covariates, you might try something more >> > > > parametric. For >> > example the simulate.rda or simulate.cca functions in >> > > > the vegan package, >> > or fit multivariate LM for nested models (i.e. >> > > > intercept only, and with >> > other covariates) and compare AIC's, or use >> > > > the simulate.lm to get >> > random numbers based on the fitted model. This >> > > > way you can base you >> > desired statistic on the simulated data sets, and >> > > > you know explicitly >> > what is the model (plus it is good for continuous >> > > > data that you have). >> > By using the null-model approach, you implicitly >> > > > have a model by >> > defining constraints for the permutations, and >> > > > p-values are >> > probabilities of the data given the constraints (null >> > > > hypothesis), and >> > not probability of the null hypothesis given the data >> > > > (what people >> > usually really want). >> > > > >> > > > Cheers, >> > > > >> > > > Peter >> > > > >> > > > >> > Péter Sólymos >> > > > Alberta Biodiversity Monitoring Institute >> > > > Department >> > of Biological Sciences >> > > > CW 405, Biological Sciences Bldg >> > > > University >> > of Alberta >> > > > Edmonton, Alberta, T6G 2E9, Canada >> > > > Phone: >> > 780.492.8534 >> > > > Fax: 780.492.7635 >> > > > >> > > > >> > > > >> > > > On Wed, Jan 6, >> > 2010 at 2:18 AM, Carsten Dormann <carsten.dorm...@ufz.de> wrote: >> > > > >> > >> > > > > Hi Etienne, >> > > > > >> > > > > the double constraint is observed by two >> > functions: >> > > > > >> > > > > swap.web in package bipartite >> > > > > >> > > > > >> > and >> > > > > >> > > > > commsimulator in vegan (at least in the r-forge >> > version) >> > > > > >> > > > > Both build on the r2dtable approach, i.e. you have, >> > as you propose, to turn >> > > > > the low values into higher-value integers. >> > > >> > > > >> > > > > The algorithm is described in the help to swap.web. >> > > > > >> > > >> > > > >> > > > > HTH, >> > > > > >> > > > > Carsten >> > > > > >> > > > > >> > > > > >> > > > > >> > On 06/01/2010 08:55, Etienne Laliberté wrote: >> > > > > >> > > > > > Hi, >> > > >> > > > > >> > > > > > Let's say I have measured plant biomass for a total of 5 >> > species from 3 >> > > > > > sites (i.e. plots), such that I end with the >> > following data matrix >> > > > > > >> > > > > > mat<- matrix(c(0.35, 0.12, 0.61, 0, >> > 0, 0.28, 0, 0.42, 0.31, 0.19, 0.82, >> > > > > > 0, 0, 0, 0.25), 3, 5, byrow = >> > T) >> > > > > > >> > > > > > dimnames(mat)<- list(c("site1", "site2", "site3"), >> > c("sp1", "sp2", >> > > > > > "sp3", "sp4", "sp5")) >> > > > > > >> > > > > > Data is >> > therefore continuous. I want to generate n random community >> > > > > > matrices >> > which both respect the following constraints: >> > > > > > >> > > > > > 1) row and >> > column totals are kept constant, such that "productivity" of >> > > > > > each >> > site is maintained, and that rare species at a "regional" level >> > > > > > stay >> > rare (and vice-versa). >> > > > > > >> > > > > > 2) number of species in each plot >> > is kept constant, i.e. each row >> > > > > > maintains the same number of zeros, >> > though these zeros should not stay >> > > > > > fixed. >> > > > > > >> > > > > > To >> > deal with continuous data, my initial idea was to transform the >> > > > > > >> > continuous data in mat to integer data by >> > > > > > >> > > > > > mat2<- >> > floor(mat * 100 / min(mat[mat> 0]) ) >> > > > > > >> > > > > > where multiplying >> > by 100 is only used to reduce the effect of rounding >> > > > > > to nearest >> > integer (a bit arbitrary). In a way, shuffling mat could now >> > > > > > be seen >> > as re-allocating "units of biomass" randomly to plots. However, >> > > > > > >> > doing so results in a matrix with large number of "individuals" to >> > > > > > >> > reshuffle, which can slow things down quite a bit. But this is only part >> > > > >> > > > of the problem. >> > > > > > >> > > > > > My main problem has been to find an >> > algorithm that can actually respect >> > > > > > constraints 1 and 2. Despite >> > trying various R functions (r2dtable, >> > > > > > permatfull, etc), I have not >> > yet been able to find one that can do >> > > > > > this. >> > > > > > >> > > > > > >> > I've had some kind help from Peter Solymos who suggested that I try the >> > > > >> > > > aylmer package, and it's *almost* what I need, but the problem is that >> > > >> > > > > their algorithm does not allow for the zeros to move within the >> > matrix; >> > > > > > they stay fixed. I want the number of zeros to stay constant >> > within each >> > > > > > row, but I want them to move freely betweem columns. >> > > >> > > > > >> > > > > > Any help would be very much appreciated. >> > > > > > >> > > > > > >> > Thanks >> > > > > > >> > > > > > >> > > > > > >> > > > > -- >> > > > > Dr. Carsten >> > F. Dormann >> > > > > Department of Computational Landscape Ecology >> > > > > >> > Helmholtz Centre for Environmental Research-UFZ >> > > > > Permoserstr. 15 >> > > > >> > > 04318 Leipzig >> > > > > Germany >> > > > > >> > > > > Tel: ++49(0)341 2351946 >> > > > >> > > Fax: ++49(0)341 2351939 >> > > > > Email: carsten.dorm...@ufz.de >> > > > > >> > internet: http://www.ufz.de/index.php?de=4205 >> > > > > >> > > > > >> > _______________________________________________ >> > > > > R-sig-ecology mailing >> > list >> > > > > R-sig-ecology@r-project.org >> > > > > >> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology >> > > > > >> > > > > >> > > > > >> > >> > > >> > > >> > > >> > >> > -- >> > Dr. Carsten F. Dormann >> > Department of >> > Computational Landscape Ecology >> > Helmholtz Centre for Environmental >> > Research-UFZ >> > Permoserstr. 15 >> > 04318 Leipzig >> > Germany >> > >> > Tel: ++49(0)341 >> > 2351946 >> > Fax: ++49(0)341 2351939 >> > Email: carsten.dorm...@ufz.de >> > internet: >> > http://www.ufz.de/index.php?de=4205 >> >> > > > -- > Etienne Laliberté > ================================ > School of Forestry > University of Canterbury > Private Bag 4800 > Christchurch 8140, New Zealand > Phone: +64 3 366 7001 ext. 8365 > Fax: +64 3 364 2124 > www.elaliberte.info > > _______________________________________________ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology > > _______________________________________________ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology