Re: [R] sample consecutive integers efficiently

Chris Oldmeadow Thu, 28 Aug 2008 17:16:57 -0700


Charles C. Berry wrote:

On Thu, 28 Aug 2008, Chris Oldmeadow wrote:
Hi all,
I have some rough code to sample consecutive integers with lengthaccording to a vector of lengths
#sample space (representing positions)
pos<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)

#sample lengths
lengths<-c(2,3,2)

From these two vectors I need a vector of sampled positions.
the sampling is without replacement, making things tough as thesampled integers need to be consecutive. Im hoping somebody knows afaster way of doing it than I have. ATM its way to slow on largevectors.
samplePos<-function(l){
   start.pos<-sample(pos,1)
   end.pos<-start.pos+l-1
   posies<-start.pos:end.pos
   posies
}

s.start<-c()


newPos<-function(a){
   rp<-samplePos(a)
   #test sampled range is consecutive, if not resample
   if (length(rp) != rp[a]+1 -rp[1]){rp<-samplePos(a)}
   pos<-setdiff(pos,rp)
   rp[1]
}

newps<-c()
newps<-unlist(lapply(lengths,newPos))
I think the bottleneck may be on the setdiff() function - the samplespace is quite large so I dont think there would be too many rejections.
The bottleneck is in the formulation of the sampling scheme.
This is a simple combinatorics problem. There are 3360 possible values( prod(16:14) ) for the start positions of the three elements, and youcan form a bijection between 1:3360 and the individual samples. If thenumber of possible sample is small enough, it would be most efficientto sample from the corresponding integer vector and then translate itto the corresponding sample. For larger values where the number ofpossible samples become a challenge for 32-bit integer arithmetic, Iexpect this approach would be preferred:
Permute length ( pos ) - sum ( lengths ) + length( lengths ) distinct(consecutively labelled) elements:
    elz <- sample( length ( pos ) - sum ( lengths ) + length( lengths ) )


Take the lengths of the original objects to be

    z.lens <- rep( 1, length( elz ) )
    z.lens[ seq(along = lengths ) ] <- lengths

(i.e. objects longer than 1 appear first)
Determine the start positions of the objects as if they were laid downconsecutively according to the permutation:
    start <- head( cumsum( c(0, z.lens[ elz ]) ) + 1 , -1 )

Find the start positions of just those with lengths greater than 1

    gt.1 <- match( seq(along=lengths) , elz )

Report the start positions

    start[ gt.1 ]

---
If length( pos ) is large, you can rewrite the above to simply samplethe positions (in the ordering) of the objects with lengths greaterthan 1. You will have to revise the calculation of start and gt.1 inthat case.
HTH,

Chuck


 >
Many thanks,
Chris

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry                            (858) 534-2098
Dept of Family/PreventiveMedicine
E mailto:[EMAIL PROTECTED]                UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego92093-0901


Thanks! that worked perfectly.

Chris

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sample consecutive integers efficiently

Reply via email to