On 06/29/2011 04:18 PM, Kunbin Qu wrote:
Hi,
When I tried to use GenomicFeatures, how could I get the gene symbol from the transcripts built from hg19? I
had the following commands, and I expected to see names like "TP53", "MMP11",
"UBE2E" etc, but instead, I only had the numbers (which did not add much value) when I used the
names().
-Kunbin
hg19kg<-makeTranscriptDbFromUCSC(genome="hg19", tablename="knownGene")
GR<-transcripts(hg19kg, vals<-list(tx_chrom="chr1", tx_strand="+"))
GRList<-transcriptsBy(hg19kg, by="gene")
names(GRList)[1:20]
[1] "1" "10" "100" "1000" "10000" "100008586"
[7] "100008587" "100009676" "10001" "10002" "10003" "100033413"
[13] "100033414" "100033415" "100033416" "100033417" "100033420" "100033422"
[19] "100033423" "100033424"
Hi Kunbin
These are ENTREZ gene ids, and you're after (the much more ambiguous)
SYMBOL identifiers. Use
nms <- GRList[1:20]
library(org.Hs.eg.db)
map <- org.Hs.egSYMBOL
toTable(map[nms])
or maybe mget(nms, map, ifnotfound=MA) and processing
Martin
sessionInfo()
R version 2.11.0 (2010-04-22)
x86_64-unknown-linux-gnu
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] tcltk stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] GenomicFeatures_1.0.4 DEGseq_2.0.0 samr_1.28
[4] impute_1.24.0 ShortRead_1.6.2 Rsamtools_1.0.1
[7] lattice_0.19-11 Biostrings_2.16.7 GenomicRanges_1.0.1
[10] IRanges_1.6.8 qvalue_1.22.0
loaded via a namespace (and not attached):
[1] Biobase_2.8.0 biomaRt_2.4.0 BSgenome_1.17.1 DBI_0.2-5
[5] grid_2.11.0 hwriter_1.3 RCurl_1.4-3 RSQLite_0.9-2
[9] rtracklayer_1.8.1 tools_2.11.0 XML_3.1-1
______________________________________________________________________
The contents of this electronic message, including any attachments, are
intended only for the use of the individual or entity to which they are
addressed and may contain confidential information. If you are not the intended
recipient, you are hereby notified that any use, dissemination, distribution,
or copying of this message or any attachment is strictly prohibited. If you
have received this transmission in error, please send an e-mail to
postmas...@genomichealth.com and delete this message, along with any
attachments, from your computer.
[[alternative HTML version deleted]]
_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing