Re: [Bioc-sig-seq] Reads in 3'utr

Valerie Obenchain Fri, 23 Sep 2011 15:54:46 -0700

On 09/23/2011 02:57 PM, rohan bareja wrote:
> Hi,
>
> utr=threeUTRsByTranscript(txdb,use.names=FALSE)
> So,utr is GRangesList of length 33381
> Then as u said,I did the following:
>
> txBygene <- transcriptsBy(txdb, "gene")
>    geneID <- rep(names(txBygene), elementLengths(txBygene))
>    df <- data.frame(geneID=geneID,
> txID=values(unlist(txBygene))[["tx_id"]])
>
>  This gives me a dataframe with 40,780 rows with gene ID and txID from 
> txBygene object.
>           geneID  txID
> 40775   9994 11731
> 40776   9994 11730
> 40777   9997 38491
> 40778   9997 38489
> 40779   9997 38496
> 40780   9997 38497
>
> Since my utr object is of length 33,381 ,my counts length is same i.e 
> 33,381
> So I am not able to map the counts to the above data frame which has 
> transcript and gene IDs.
>


Yes, these lengths are different.

In this example we have utr regions from 58 transcripts.

 > length(utr)
[1] 58


Those 58 transcripts can be matched to their gene ID's by looking at the 
txBygene object. All of the transcripts fall into one (or more) of 51 
genes,

 > length(txBygene)
[1] 51

There are multiple transcripts per gene so we expand the gene ID's to 
map to the transcripts.

 > dim(df)
[1] 79  2

This data.frame has all transcripts from the txdb mapped to the gene 
ID's. Your utr data may contain only a subset of these transcripts. That 
is something you need to check.  Match the desired transcript names to 
the df, pull out the gene IDs. You then have the gene ID's for your utr 
regions and can split or group your counts by gene.

Valerie
>
>
>
> --- On *Fri, 23/9/11, Valerie Obenchain /<voben...@fhcrc.org>/*wrote:
>
>
>     From: Valerie Obenchain <voben...@fhcrc.org>
>     Subject: Re: [Bioc-sig-seq] Reads in 3'utr
>     To: "rohan bareja" <rohan_1...@yahoo.co.in>
>     Cc: bioc-sig-sequencing@r-project.org
>     Date: Friday, 23 September, 2011, 10:50 PM
>
>     Hi Rohan,
>
>     You can relate the counts for 3UTR regions to gene IDs through the
>     transcript IDs.
>
>         txdb_file <- system.file("extdata",
>     "UCSC_knownGene_sample.sqlite", package="GenomicFeatures")
>         txdb <- loadFeatures(txdb_file)
>         utr=threeUTRsByTranscript(txdb,use.names=FALSE)
>
>
>     The transcript names can be matched to the gene ID's through,
>
>         txBygene <- transcriptsBy(txdb, "gene")
>         geneID <- rep(names(txBygene), elementLengths(txBygene))
>         df <- data.frame(geneID=geneID,
>     txID=values(unlist(txBygene))[["tx_id"]])
>
>     Now you know what gene ID each tx count belongs to. You can split
>     your counts by gene ID ...
>
>
>     Valerie
>
>
>
>     On 09/20/2011 12:13 PM, rohan bareja wrote:
>>     Hi everyone,
>>     I am doing NGS analysis using bam files.I have counted reads in 3'utr 
>> region using 
>>     utr=threeUTRsByTranscript(txdb,use.names=FALSE)
>>     countsUTR<- countOverlaps(utr,reads)
>>     I have got the transcript level counts from this.How can I get the gene 
>> level counts??It might sound silly but Does anybody have an idea on what 
>> type of anaylses we can do from this countsUTR ?
>>     Thanks,Rohan
>>      [[alternative HTML version deleted]]
>>
>>
>>
>>     _______________________________________________
>>     Bioc-sig-sequencing mailing list
>>     Bioc-sig-sequencing@r-project.org  
>> </mc/compose?to=Bioc-sig-sequencing@r-project.org>
>>     https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>


        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] Reads in 3'utr

Reply via email to