Hi Michael et al.

Assume I have a list of AlignedRead objects, with data from several Solexa lanes. I would like to get a coverage vector over all the lanes.

As 'coverage' takes only one AlignedRead object, I have two possibilities:

(a) Concatenate the AlignedRead objects to a single big one. As far as I can see, 'rbind2' is not implemented for AlignedRead, and 'append' seemed very slow to me. It is probably not posible to do this without making a copy of all the data in memory.

(b) Calculate the coverage for each AlignedRead object separately and add up the GenomeData objects. The `+` operator is not supported for GenomeData objects but it is for Rle objects. So I need to loop through the chromosomes.

I've now written this short function for the purpose:

sumUpCoverage <- function( lanes, seqLens )
{
   res <- NULL
   for( i in 1:length(lanes) ) {
      cvg <- coverage( lanes, width=seqLens )
      if( is.null( res ) )
         res <- cvg
      else {
         stopifnot( all( names(res) == names(cvg) ) )
         for( seq in names(res) )
            res[[seq]] <- res[[seq]] + cvg[[seq]]
      }
   }
   res
}

This does not seem very elegant (and it takes 9 minutes, which, however, might be ok for 29 mio reads). Any idea how to do it better? (The use of 'for' instead of 'sapply' is on purpose: I hope to save memory that way.)

And would it make sense to overload the `+` operator for GenomeData objects? If so, could I suggest adding this to BSgenome?

Cheers
  Simon

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to