Hi Jasper,
maybe its not a very R'ish solution but the following could be a
starting point:
First, notice that every combination you are looking for can be
represented as an integer in binary notation where each bit stands for a
specific community.
So looping through all combinations is the same as looping through
0:(2^n-1) , eg. 7=2^0+2^2, so community 1 and 3 (natural numbering) are
"set" in the corresponding subset.
#some auxillary functions are needed to extract the bits set in a given
combination
#get bit returns presence/absence vector of species for given subset x
get.bit<-function(x,n) x%/%(2^(0:(n-1)))%%2
#index of data columns included in subset x
get.index<-function(x,n) x%/%(2^(0:(n-1)))%%2==1
n<-15 # number of communities
dta<-matrix(1:(12*n),ncol=n) # some sample data
#looping thru *all* possible combinations of n communities.
#will work, but it is in fact *very*! time consuming for n=25
for (i in 0:(2^n-1)) {
sub.index<-get.index(i,n)
# subsetting your dataset using this index and do subset analysis
# dta[,sub.index]
}
hth
jasper slingsby schrieb:
Hello
I apologise for the length of this entry but please bear with me.
In short:
I need a way of subsampling communities from all possible communities of n
taxa taken 1:n at a time without having to calculate all possible
combinations (because this gives me a memory error - using
combn() or expand.grid() at least). Does anyone know of a function? Or can
you help me edit the
combn
or
expand.grid
functions to generate subsamples?
In long:
I have been creating all possible communities of n taxa taken 1:n at a time
to get a presence/absence matrix of species occurrence in communities as
below...
Rows are samples, columns are species:
A B C D . . . .
1 0 1 1 1 0 0 0 1 1 1 1 0 0
0 0
0 1 1 1 1 0 0 0 1 1 1 1 0 0
0 0
1 1 1 1 1 0 0 0 1 1 1 1 0 0
0 0
0 0 0 0 0 1 0 0 1 1 1 1 0 0
0 0
1 0 0 0 0 1 0 0 1 1 1 1 0 0
0 0
0 1 0 0 0 1 0 0 1 1 1 1 0 0
0 0
1 1 0 0 0 1 0 0 1 1 1 1 0 0
0 0
0 0 1 0 0 1 0 0 1 1 1 1 0 0
0 0
...but the number of possible communities increases exponentially with each
added taxon.
n<-11 #number of taxa
sum(for (i in 0:n) choose(i, k = 0:i)) #number of combos
So all possible combinations of 11 taxa taken 1:11 at a time is 2048, all
combos of 12 taken 1:12 is 4096, 13 taken 1:13 = 8192...etc etc such that
when I reach about 25 taken 1:25 the number of combos is 33554432 and I get
a memory error.
I have found that the number of combos of x taxa taken from a pool of n
creates a very kurtotic unimodal distribution,...
x<-vector("integer",20)
for (i in 1:20) {x[i]<-choose(20,i)}
plot(x)
...but have found that limiting the number of samples for any community size
to 1000 is good enough for the further analyses I wish to do.
My problem lies in sampling all possible combos without having to calculate
all possible combos. I have tried two methods but both give memory errors at
about 25 taxa.
The expand.grid() method:
n <- 11
toto <- vector("list",n)
titi <- lapply(toto,function(x) c(0,1))
tutu <- expand.grid(titi)
The combn() method (a slightly lengthlier function):
samplecommunityD<- function(n,numsamples)
{
super<-mat.or.vec(,n)
for (numspploop in 1:n)
{
minor<-t(combn(n,numspploop))
if (dim(minor)[1]<numsamples)
{
minot<-mat.or.vec(dim(minor)[1],n)
for (loopi in 1:dim(minor)[1])
{
for (loopbi in 1:dim(minor)[2])
{
minot[loopi,minor[loopi,loopbi]] <- 1
}
}
super<-rbind(super,minot)
rm(minot)
}
else
{
minot<-mat.or.vec(numsamples,n)
for (loopii in 1:numsamples)
{
thousand<-sample(dim(minor)[1],numsamples)
for (loopbii in 1:dim(minor)[2])
{
minot[loopii,minor[thousand[loopii],loopbii]] <- 1
}
}
super<-rbind(super,minot)
rm(minot)
}
}
super<-super[!rowSums(super)>n-1&!rowSums(super)<2,]
return(super)
}
samplecommunityD(11,1000)
So unless anyone knows of another function I could try my next step would be
to modify the combn or expand.grid functions to generate subsamples, but
their coding beyond me at this stage (I'm a 3.5 month newbie). Can anyone
identify where in the code I would need to introduce a sampling term or
skipping sequence?
Thanks for your time
Jasper
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.