Hi Ivan,
Ivan Gregoretti wrote:
Hello everyone,
It is very easy to display one sequence of DNA from the mouse genome.
For example
library(BSgenome.Mmusculus.UCSC.mm9)
DNAString(Mmusculus$chr1)[100000000:100000050]
51-letter "DNAString" instance
seq: GGACTGCTGTTGCTGATTCATGTTTGATGTTTTAGACTGCTAATATCCTGA
My question:
Now lets say I have a BED-like list of genomic spaces like this
head(A[ , c("chr", "start", "end")])
chr start end
1 chr1 3644952 3649720
2 chr1 4599146 4601342
3 chr1 5015865 5018830
4 chr1 5072928 5076881
5 chr1 5504220 5507065
6 chr1 5513886 5516391
How do I display many sequences from different chromosomes?
DNAStringSet(sapply(seq_len(nrow(A)),
function(i)
getSeq(Mmusculus,
as.vector(A$chr[i]),
start=A$start[i], end=A$end[i])))
I think you have a fairly reasonable use-case here so I'm going to work
of vectorizing getSeq() so you'll be able to do something like:
getSeq(Mmusculus, as.vector(A$chr), start=A$start, end=A$end)
to get the same thing.
Another question:
I wish to add these sequences to my BED-like data.frame as a new
field. How do I convert them to strings?
Then don't call DNAStringSet() on what's returned by sapply() in the above
code.
In my defense:
The first question is not covered in the documentation of
BSgenome.Mmusculus.UCSC.mm9.
Right. But since this is a BSgenome generic question, a more appropriate
place to cover this is in the doc of the BSgenome package itself. I'll
cover this in ?getSeq.
Thanks for your feedback.
H.
Thank you,
Ivan
Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: [email protected]
Phone: (206) 667-5791
Fax: (206) 667-1319
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing