Hi Jesper,
On 02/17/2014 01:10 AM, Jesper Gådin wrote:
Hi,
Have come across a cigar-vector that is problematic to process.
#load package
library(GenomicAlignments)
#load data (see attached file)
load("2014-02-17-cigarExample.rdata")
#run function cigarToRleList
cigarRle <- cigarToRleList(cigarExample)
Error in .Call2("Rle_constructor", values, lengths, check, 0L, PACKAGE =
"IRanges") :
integer overflow while summing elements in 'lengths'
You're reaching the limit of what can be represented with a
CompressedRleList object.
More precisely: cigarToRleList() returns a CompressedRleList object
which is a special kind of CompressedList object. A CompressedList
object is compact in memory and generally fast but it has the
restriction that the total length of all the list elements in it
must be <= 2^31 - 1 (i.e. <= .Machine$integer.max)
I don't know what you were planning to do with the object returned
by cigarToRleList() but note that you can get something similar by
calling explodeCigarOpLengths() and explodeCigarOps() separately.
These return the run lengths and run values you see in the RleList
object returned by cigarToRleList() but now in 2 separate objects:
> cigarToRleList(cigar2)
RleList of length 4
[[1]]
character-Rle of length 51 with 3 runs
Lengths: 40 2 9
Values : "M" "I" "M"
[[2]]
character-Rle of length 98 with 9 runs
Lengths: 3 15 55 4 2 6 2 5 6
Values : "H" "M" "N" "M" "I" "M" "D" "M" "S"
[[3]]
character-Rle of length 2027 with 4 runs
Lengths: 2 10 2000 15
Values : "S" "M" "N" "M"
[[4]]
character-Rle of length 41 with 3 runs
Lengths: 3 33 5
Values : "H" "M" "H"
> explodeCigarOpLengths(cigar2)
[[1]]
[1] 40 2 9
[[2]]
[1] 3 15 55 4 2 6 2 5 6
[[3]]
[1] 2 10 2000 15
[[4]]
[1] 3 33 5
> explodeCigarOps(cigar2)
[[1]]
[1] "M" "I" "M"
[[2]]
[1] "H" "M" "N" "M" "I" "M" "D" "M" "S"
[[3]]
[1] "S" "M" "N" "M"
[[4]]
[1] "H" "M" "H"
Note that explodeCigarOpLengths() and explodeCigarOps() return ordinary
lists. You can turn them into IntegerList and CharacterList,
respectively, with:
> cig_lens <- IntegerList(explodeCigarOpLengths(cigar2))
> cig_lens
IntegerList of length 4
[[1]] 40 2 9
[[2]] 3 15 55 4 2 6 2 5 6
[[3]] 2 10 2000 15
[[4]] 3 33 5
> cig_ops <- CharacterList(explodeCigarOps(cigar2))
> cig_ops
CharacterList of length 4
[[1]] M I M
[[2]] H M N M I M D M S
[[3]] S M N M
[[4]] H M H
The advantage of this representation over ordinary list is that then
you can do things like:
> cig_lens[cig_ops == "I"]
IntegerList of length 4
[[1]] 2
[[2]] 2
[[3]] integer(0)
[[4]] integer(0)
Many basic operations on IntegerList and CharacterList are optimized and
should be fast.
Please let me know if you need further help with this.
H.
sessionInfo()
R Under development (unstable) (2013-11-13 r64209)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] GenomicAlignments_0.99.18 Rsamtools_1.15.26
[3] Biostrings_2.31.12 XVector_0.3.6
[5] GenomicRanges_1.15.26 IRanges_1.21.23
[7] BiocGenerics_0.9.3
loaded via a namespace (and not attached):
[1] BatchJobs_1.1-1135 BBmisc_1.5 BiocParallel_0.5.8 bitops_1.0-6
[5] brew_1.0-6 codetools_0.2-8 DBI_0.2-7 digest_0.6.4
[9] fail_1.2 foreach_1.4.1 iterators_1.0.6 plyr_1.8
[13] RSQLite_0.11.4 sendmailR_1.1-2 stats4_3.1.0 tools_3.1.0
[17] zlibbioc_1.9.0
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpa...@fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel