Michael, I am terribly sorry. Steve Lianoglou on this group has been helping me with my previous questions. I messed up and thought it was him responding my emails. My sincere apology, Michael.
-Kunbin From: Michael Lawrence [mailto:lawrence.mich...@gene.com] Sent: Friday, July 22, 2011 4:13 PM To: Kunbin Qu Cc: Michael Lawrence; bioc-sig-sequencing@r-project.org Subject: Re: [Bioc-sig-seq] count the coverage base by base Just call coverage() on the GRanges. And just for the record, my name is not Steve :) Michael On Fri, Jul 22, 2011 at 4:03 PM, Kunbin Qu <k...@genomichealth.com<mailto:k...@genomichealth.com>> wrote: Steve, the reads are GRanges, like the following. If the reads go across two exons, it would be two GRanges. I was using CASAVA1.7 (ie, ELAND2.0) to generate the alignment. In the export file from CASAVA, it has this weird representation of the spliced reads: SEQUENCER02 110 1 1101 4693 1950 TTAGGC 1 TTGCTGCAAGCATTTGAGAACAACCTTTTTCGTGCT `acccbcccggggggggggggggggggggggggggg splice_sites-auto.fa TAF1_35_35_chrX.fa_70604899_70607111 27 F 36 118 Y So I wrote a script to convert the above single entry (one read) into two entries, one for each portion falling into the two exons where the read goes across. Then read them into R as GRanges. Could you show me a little more how to coverage functions to get the counts? Thanks. -Kunbin > gr GRanges with 27421835 ranges and 0 elementMetadata values seqnames ranges strand | <Rle> <IRanges> <Rle> | [1] chr11 [ 48034060, 48034095] + | [2] chr13 [103319962, 103319997] + | [3] chr2 [198350561, 198350596] - | [4] chr12 [ 41850809, 41850844] + | [5] chr16 [ 89974865, 89974900] - | [6] chr1 [172113839, 172113874] - | [7] chr12 [111080272, 111080307] - | [8] chr2 [179445437, 179445472] - | [9] chr10 [119817069, 119817104] + | ... ... ... ... ... [27421827] chr17 [ 43334904, 43334939] - | [27421828] chr6 [163903657, 163903692] + | [27421829] chr18 [ 74737099, 74737134] - | [27421830] chr13 [ 78311617, 78311652] - | [27421831] chr4 [170541832, 170541867] + | [27421832] chr13 [ 32601946, 32601981] + | [27421833] chr3 [ 38420019, 38420054] + | [27421834] chr8 [ 74561716, 74561751] - | [27421835] chr3 [ 39319812, 39319847] - | seqlengths chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX chrY NA NA NA NA NA NA ... NA NA NA NA NA NA > From: Michael Lawrence [mailto:lawrence.mich...@gene.com<mailto:lawrence.mich...@gene.com>] Sent: Friday, July 22, 2011 3:32 PM To: Kunbin Qu Cc: Michael Lawrence; bioc-sig-sequencing@r-project.org<mailto:bioc-sig-sequencing@r-project.org> Subject: Re: [Bioc-sig-seq] count the coverage base by base You could call coverage on all sorts of things. How are you representing your reads? GappedAlignments would work, for example. > showMethods(coverage) Function: coverage (package IRanges) x="AlignedXStringSet0" x="GRangesList" x="GappedAlignments" x="GenomicRanges" x="IRanges" x="MIndex" x="MaskCollection" x="MaskedXString" x="PairwiseAlignedFixedSubject" x="PairwiseAlignedFixedSubjectSummary" x="RangedData" x="RangesList" x="Views" x="numeric" On Fri, Jul 22, 2011 at 3:24 PM, Kunbin Qu <k...@genomichealth.com<mailto:k...@genomichealth.com>> wrote: Steve, thanks for the advice. But I did not get it: coverage() is applied to a set of IRanges. What should I use for the input of the coverage() then? Is it possible for you to show me a little pseudo or real code? Thanks. -Kunbin From: Michael Lawrence [mailto:lawrence.mich...@gene.com<mailto:lawrence.mich...@gene.com>] Sent: Friday, July 22, 2011 3:15 PM To: Kunbin Qu Cc: bioc-sig-sequencing@r-project.org<mailto:bioc-sig-sequencing@r-project.org> Subject: Re: [Bioc-sig-seq] count the coverage base by base coverage(), form Views() on the coverage for your genes and then viewSums(). On Fri, Jul 22, 2011 at 2:36 PM, Kunbin Qu <k...@genomichealth.com<mailto:k...@genomichealth.com>> wrote: Hi, If I want to count the coverage base by base, not read by read, can I still use countOverlaps? I have a human transcriptome done, and would like to count the coverage for each gene based on the mapping. As there are some reads mapped across the junctions (ie, one read is splitted into two portions), or partially mapped into the introns, it would be good to count the coverage by base pair, instead of by read number? Could anybody suggest a way doing that? Thanks. -Kunbin ______________________________________________________________________ The contents of this electronic message, including any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain confidential information. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this message or any attachment is strictly prohibited. If you have received this transmission in error, please send an e-mail to postmas...@genomichealth.com<mailto:postmas...@genomichealth.com> and delete this message, along with any attachments, from your computer. [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org<mailto:Bioc-sig-sequencing@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing ______________________________________________________________________ The contents of this electronic message, including any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain confidential information. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this message or any attachment is strictly prohibited. If you have received this transmission in error, please send an e-mail to postmas...@genomichealth.com<mailto:postmas...@genomichealth.com> and delete this message, along with any attachments, from your computer. ______________________________________________________________________ The contents of this electronic message, including any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain confidential information. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this message or any attachment is strictly prohibited. If you have received this transmission in error, please send an e-mail to postmas...@genomichealth.com<mailto:postmas...@genomichealth.com> and delete this message, along with any attachments, from your computer. ______________________________________________________________________ The contents of this electronic message, including any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain confidential information. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this message or any attachment is strictly prohibited. If you have received this transmission in error, please send an e-mail to postmas...@genomichealth.com and delete this message, along with any attachments, from your computer. [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing