On 09/26/11 09:16, rohan bareja wrote:
Yes, you need to merge the counts for each gene.
see ?split
Split the counts on the gene IDs then sum with lapply. Something like,
lapply(split(df$counts, df$geneID), sum)
Valerie
Hi Valerie,
Thanks a lot..It worked finally.. So now I have a data frame for the
geneIds ,TranscriptIds and the counts (3'utr) which is given below:
GENE TX countsUTRctl
[1,] "148398" "1121" "2"
[2,] "339451" "1118" "0"
[3,]"84069" "1116" "0"
[4,] "84069" "1119" "11"
[5,] "9636" "1126" "11"
[6,] "375790" "1127" "0"
Now I want to do differential expression of genes using DESeq,so do I
have to merge the two same genes and its counts such as geneID 84069
(from above ) or i can proceed with the above dataframe?
If I have to merge them how do I do that?
Thanks,
Rohan
--- On *Sat, 24/9/11, Valerie Obenchain /<voben...@fhcrc.org>/* wrote:
From: Valerie Obenchain <voben...@fhcrc.org>
Subject: Re: [Bioc-sig-seq] Reads in 3'utr
To: "rohan bareja" <rohan_1...@yahoo.co.in>
Cc: bioc-sig-sequencing@r-project.org
Date: Saturday, 24 September, 2011, 4:24 AM
On 09/23/2011 02:57 PM, rohan bareja wrote:
Hi,
utr=threeUTRsByTranscript(txdb,use.names=FALSE)
So,utr is GRangesList of length 33381
Then as u said,I did the following:
txBygene <- transcriptsBy(txdb, "gene")
geneID <- rep(names(txBygene), elementLengths(txBygene))
df <- data.frame(geneID=geneID,
txID=values(unlist(txBygene))[["tx_id"]])
This gives me a dataframe with 40,780 rows with gene ID and txID
from txBygene object.
geneID txID
40775 9994 11731
40776 9994 11730
40777 9997 38491
40778 9997 38489
40779 9997 38496
40780 9997 38497
Since my utr object is of length 33,381 ,my counts length is same
i.e 33,381
So I am not able to map the counts to the above data frame which
has transcript and gene IDs.
Yes, these lengths are different.
In this example we have utr regions from 58 transcripts.
> length(utr)
[1] 58
Those 58 transcripts can be matched to their gene ID's by looking
at the txBygene object. All of the transcripts fall into one (or
more) of 51 genes,
> length(txBygene)
[1] 51
There are multiple transcripts per gene so we expand the gene ID's
to map to the transcripts.
> dim(df)
[1] 79 2
This data.frame has all transcripts from the txdb mapped to the
gene ID's. Your utr data may contain only a subset of these
transcripts. That is something you need to check. Match the
desired transcript names to the df, pull out the gene IDs. You
then have the gene ID's for your utr regions and can split or
group your counts by gene.
Valerie
--- On *Fri, 23/9/11, Valerie Obenchain /<voben...@fhcrc.org>
</mc/compose?to=voben...@fhcrc.org>/*wrote:
From: Valerie Obenchain <voben...@fhcrc.org>
</mc/compose?to=voben...@fhcrc.org>
Subject: Re: [Bioc-sig-seq] Reads in 3'utr
To: "rohan bareja" <rohan_1...@yahoo.co.in>
</mc/compose?to=rohan_1...@yahoo.co.in>
Cc: bioc-sig-sequencing@r-project.org
</mc/compose?to=bioc-sig-sequencing@r-project.org>
Date: Friday, 23 September, 2011, 10:50 PM
Hi Rohan,
You can relate the counts for 3UTR regions to gene IDs
through the transcript IDs.
txdb_file <- system.file("extdata",
"UCSC_knownGene_sample.sqlite", package="GenomicFeatures")
txdb <- loadFeatures(txdb_file)
utr=threeUTRsByTranscript(txdb,use.names=FALSE)
The transcript names can be matched to the gene ID's through,
txBygene <- transcriptsBy(txdb, "gene")
geneID <- rep(names(txBygene), elementLengths(txBygene))
df <- data.frame(geneID=geneID,
txID=values(unlist(txBygene))[["tx_id"]])
Now you know what gene ID each tx count belongs to. You can
split your counts by gene ID ...
Valerie
On 09/20/2011 12:13 PM, rohan bareja wrote:
Hi everyone,
I am doing NGS analysis using bam files.I have counted reads in 3'utr region using
utr=threeUTRsByTranscript(txdb,use.names=FALSE)
countsUTR<- countOverlaps(utr,reads)
I have got the transcript level counts from this.How can I get the gene
level counts??It might sound silly but Does anybody have an idea on what type
of anaylses we can do from this countsUTR ?
Thanks,Rohan
[[alternative HTML version deleted]]
_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing