Re: [R] how to subsample all possible combinations of n species taken 1:n at a time?

Eik Vettorazzi Mon, 06 Apr 2009 13:06:08 -0700

Hi Jasper,

maybe its not a very R'ish solution but the following could be astarting point:First, notice that every combination you are looking for can berepresented as an integer in binary notation where each bit stands for aspecific community.So looping through all combinations is the same as looping through0:(2^n-1) , eg. 7=2^0+2^2, so community 1 and 3 (natural numbering) are"set" in the corresponding subset.

#some auxillary functions are needed to extract the bits set in a givencombination

#get bit returns presence/absence vector of species for given subset x
get.bit<-function(x,n)  x%/%(2^(0:(n-1)))%%2

#index of data columns included in subset x
get.index<-function(x,n)  x%/%(2^(0:(n-1)))%%2==1

n<-15 # number of communities
dta<-matrix(1:(12*n),ncol=n) # some sample data

#looping thru *all* possible combinations of n communities.
#will work, but it is in fact *very*! time consuming for n=25
for (i in 0:(2^n-1))  {
sub.index<-get.index(i,n)
# subsetting your dataset using this index and do subset analysis
# dta[,sub.index]
}

hth



jasper slingsby schrieb:

Hello

I apologise for the length of this entry but please bear with me.

In short:
I need a way of subsampling communities from all possible communities of n
taxa taken 1:n at a time without having to calculate all possible

combinations (because this gives me a memory error - usingcombn() or expand.grid() at least). Does anyone know of a function? Or canyou help me edit thecombnorexpand.gridfunctions to generate subsamples?


In long:
I have been creating all possible communities of n taxa taken 1:n at a time
to get a presence/absence matrix of species occurrence in communities as
below...

Rows are samples, columns are species:

    A    B    C   D     .     .    .    .

1 0 1 1 1 0 0 0 1 1 1 1 0 00 00 1 1 1 1 0 0 0 1 1 1 1 0 00 01 1 1 1 1 0 0 0 1 1 1 1 0 00 00 0 0 0 0 1 0 0 1 1 1 1 0 00 01 0 0 0 0 1 0 0 1 1 1 1 0 00 00 1 0 0 0 1 0 0 1 1 1 1 0 00 01 1 0 0 0 1 0 0 1 1 1 1 0 00 00 0 1 0 0 1 0 0 1 1 1 1 0 00 0


...but the number of possible communities increases exponentially with each

added taxon.

n<-11     #number of taxa
sum(for (i in 0:n) choose(i, k = 0:i)) #number of combos

So all possible combinations of 11 taxa taken 1:11 at a time is 2048, all
combos of 12 taken 1:12 is 4096, 13 taken 1:13 = 8192...etc etc such that
when I reach about 25 taken 1:25 the number of combos is 33554432 and I get
a memory error.

I have found that the number of combos of x taxa taken from a pool of n

creates a very kurtotic unimodal distribution,...

x<-vector("integer",20)
for (i in 1:20) {x[i]<-choose(20,i)}
plot(x)

...but have found that limiting the number of samples for any community size
to 1000 is good enough for the further analyses I wish to do.
My problem lies in sampling all possible combos without having to calculate
all possible combos. I have tried two methods but both give memory errors at
about 25 taxa.

The expand.grid() method:

n <- 11toto <- vector("list",n)

titi <- lapply(toto,function(x) c(0,1))
tutu <- expand.grid(titi)

The combn() method (a slightly lengthlier function):

samplecommunityD<- function(n,numsamples)
{
super<-mat.or.vec(,n)
for (numspploop in 1:n)
{
  minor<-t(combn(n,numspploop))
  if (dim(minor)[1]<numsamples)
  {
    minot<-mat.or.vec(dim(minor)[1],n)
    for (loopi in 1:dim(minor)[1])
    {
      for (loopbi in 1:dim(minor)[2])
      {
        minot[loopi,minor[loopi,loopbi]] <- 1
      }
    }
    super<-rbind(super,minot)
    rm(minot)
  }
  else
 {
   minot<-mat.or.vec(numsamples,n)
   for (loopii in 1:numsamples)
   {
     thousand<-sample(dim(minor)[1],numsamples)
       for (loopbii in 1:dim(minor)[2])
       {
       minot[loopii,minor[thousand[loopii],loopbii]] <- 1
       }
   }
   super<-rbind(super,minot)
   rm(minot)
 }
}
super<-super[!rowSums(super)>n-1&!rowSums(super)<2,]
return(super)
}

samplecommunityD(11,1000)


So unless anyone knows of another function I could try my next step would be
to modify the combn or expand.grid functions to generate subsamples, but
their coding beyond me at this stage (I'm a 3.5 month newbie). Can anyone
identify where in the code I would need to introduce a sampling term or
skipping sequence?

Thanks for your time
Jasper


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to subsample all possible combinations of n species taken 1:n at a time?

Reply via email to