[R] Efficiency challenge: MANY subsets

Johannes Graumann Fri, 16 Jan 2009 05:11:20 -0800

Hello,

I have a list of character vectors like this:


sequences <- list(
  c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I","M",
  "N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y","F",
  "N","I","N","I","N","I","D","K","M","Y","I","H","*")
)

and another list of subset ranges like this:

indexes <- list(
  list(
    c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51)
  )
)

What I now want to do is to subset each entry in "sequences" 
(sequences[[1]]) with all ranges in the corresponding low level list in 
"indexes" (indexes[[1]]). Here is what I came up with.

fragments <- list()
for(iN in seq(length(sequences))){
  cat(paste(iN,"\n"))
  tmpFragments <- sapply(
    indexes[[iN]],
    function(x){
      sequences[[iN]][seq.int(x[1],x[2])]
    }
  )
  fragments[[iN]] <- tmpFragments
}

This works fine, but "sequences" contains thousands of entries and the 
corresponding "indexes" are sometimes hundreds of ranges long, so this whole 
process is EXTREMELY inefficient.

Does somebody out there take the challenge and show me a way on how to speed 
this up?

Thanks for any hints,

Joh

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Efficiency challenge: MANY subsets

Reply via email to