Re: [Bioc-sig-seq] Reads in 3'utr

rohan bareja Tue, 27 Sep 2011 09:33:39 -0700

Hi Valerie,
Thanks a lot..It worked finally.. So now I have a data frame for the geneIds 
,TranscriptIds and the counts (3'utr) which is given below:
  GENE     TX     countsUTRctl[1,] "148398" "1121" "2"         [2,] "339451" 
"1118" "0"         [3,] "84069"  "1116" "0"         [4,] "84069"  "1119" "11"   
     [5,] "9636"   "1126" "11"        [6,] "375790" "1127" "0"     
Now I want to do differential expression of genes using DESeq,so do I have to 
merge the two same genes and its counts such as geneID 84069 (from above ) or i 
can proceed with the above dataframe?If I have to merge them how do I do that?
Thanks,Rohan
--- On Sat, 24/9/11, Valerie Obenchain <voben...@fhcrc.org> wrote:


From: Valerie Obenchain <voben...@fhcrc.org>
Subject: Re: [Bioc-sig-seq] Reads in 3'utr
To: "rohan bareja" <rohan_1...@yahoo.co.in>
Cc: bioc-sig-sequencing@r-project.org
Date: Saturday, 24 September, 2011, 4:24 AM



  

    
  
  
    On 09/23/2011 02:57 PM, rohan bareja wrote:
    
      
        
          
            Hi,
              

              
              
                utr=threeUTRsByTranscript(txdb,use.names=FALSE)
                So,utr
                    is GRangesList of length 33381
              
                
              Then
                  as u said,I did the following: 
              

                
              
                txBygene <- transcriptsBy(txdb, "gene")
                   geneID <- rep(names(txBygene),
                    elementLengths(txBygene))
                   df <- data.frame(geneID=geneID, 
                txID=values(unlist(txBygene))[["tx_id"]])
                

                  
                 This
                  gives me a dataframe with 40,780 rows with gene ID and
                  txID from txBygene object.
              
              
                  
                              geneID  txID
                    40775   9994 11731
                    40776   9994 11730
                    40777   9997 38491
                    40778   9997 38489
                    40779   9997 38496
                    40780   9997 38497
                  
                  

                  
                  Since my utr object is of length 33,381 ,my
                    counts length is same i.e 33,381
                  So I am not able to map the counts to the above
                    data frame which has transcript and gene IDs.
                
            
          
        
      
    
    

    Yes, these lengths are different.

    

    In this example we have utr regions from 58 transcripts.

    

    > length(utr)

    [1] 58

    

    

    Those 58 transcripts can be matched to their gene ID's by looking at
    the txBygene object. All of the transcripts fall into one (or more)
    of 51 genes, 

    

    > length(txBygene)

    [1] 51

    

    There are multiple transcripts per gene so we expand the gene ID's
    to map to the transcripts. 

    

    > dim(df)

    [1] 79  2

    

    This data.frame has all transcripts from the txdb mapped to the gene
    ID's. Your utr data may contain only a subset of these transcripts.
    That is something you need to check.  Match the desired transcript
    names to the df, pull out the gene IDs. You then have the gene ID's
    for your utr regions and can split or group your counts by gene.

    

    Valerie

    
      
        
          
            
              
                  

                  
                  

                  
                

                ---
                  On Fri, 23/9/11, Valerie Obenchain <voben...@fhcrc.org> wrote:

                

                  From: Valerie Obenchain <voben...@fhcrc.org>

                  Subject: Re: [Bioc-sig-seq] Reads in 3'utr

                  To: "rohan bareja" <rohan_1...@yahoo.co.in>

                  Cc: bioc-sig-sequencing@r-project.org

                  Date: Friday, 23 September, 2011, 10:50 PM

                  

                   Hi Rohan,

                    

                    You can relate the counts for 3UTR regions to gene
                    IDs through the transcript IDs.

                    

                        txdb_file <- system.file("extdata",
                    "UCSC_knownGene_sample.sqlite",
                    package="GenomicFeatures")

                        txdb <- loadFeatures(txdb_file)

                        utr=threeUTRsByTranscript(txdb,use.names=FALSE)

                    

                    

                    The transcript names can be matched to the gene ID's
                    through,

                    

                        txBygene <- transcriptsBy(txdb, "gene")

                        geneID <- rep(names(txBygene),
                    elementLengths(txBygene))

                        df <- data.frame(geneID=geneID,
                    txID=values(unlist(txBygene))[["tx_id"]])

                    

                    Now you know what gene ID each tx count belongs to.
                    You can split your counts by gene ID ...

                    

                    

                    Valerie

                    

                    

                    

                    On 09/20/2011 12:13 PM, rohan bareja wrote:
                    
                      Hi everyone,
I am doing NGS analysis using bam files.I have counted reads in 3'utr region 
using 
utr=threeUTRsByTranscript(txdb,use.names=FALSE)
countsUTR <- countOverlaps(utr,reads)
I have got the transcript level counts from this.How can I get the gene level 
counts??It might sound silly but Does anybody have an idea on what type of 
anaylses we can do from this countsUTR ?
Thanks,Rohan
        [[alternative HTML version deleted]]


                      
_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

                    
                    

                  
                
              
            
          
        
      
    
    

  


        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] Reads in 3'utr

Reply via email to