On 06/29/2011 04:18 PM, Kunbin Qu wrote:
Hi,

When I tried to use GenomicFeatures, how could I get the gene symbol from the transcripts built from hg19? I 
had the following commands, and I expected to see names like "TP53", "MMP11", 
"UBE2E" etc, but instead, I only had the numbers (which did not add much value) when I used the 
names().

-Kunbin


hg19kg<-makeTranscriptDbFromUCSC(genome="hg19", tablename="knownGene")
GR<-transcripts(hg19kg, vals<-list(tx_chrom="chr1", tx_strand="+"))
GRList<-transcriptsBy(hg19kg, by="gene")
names(GRList)[1:20]
  [1] "1"         "10"        "100"       "1000"      "10000"     "100008586"
  [7] "100008587" "100009676" "10001"     "10002"     "10003"     "100033413"
[13] "100033414" "100033415" "100033416" "100033417" "100033420" "100033422"
[19] "100033423" "100033424"

Hi Kunbin

These are ENTREZ gene ids, and you're after (the much more ambiguous) SYMBOL identifiers. Use

  nms <- GRList[1:20]
  library(org.Hs.eg.db)
  map <- org.Hs.egSYMBOL
  toTable(map[nms])

or maybe mget(nms, map, ifnotfound=MA) and processing

Martin

sessionInfo()
R version 2.11.0 (2010-04-22)
x86_64-unknown-linux-gnu

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] tcltk     stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
  [1] GenomicFeatures_1.0.4 DEGseq_2.0.0          samr_1.28
  [4] impute_1.24.0         ShortRead_1.6.2       Rsamtools_1.0.1
  [7] lattice_0.19-11       Biostrings_2.16.7     GenomicRanges_1.0.1
[10] IRanges_1.6.8         qvalue_1.22.0

loaded via a namespace (and not attached):
  [1] Biobase_2.8.0     biomaRt_2.4.0     BSgenome_1.17.1   DBI_0.2-5
  [5] grid_2.11.0       hwriter_1.3       RCurl_1.4-3       RSQLite_0.9-2
  [9] rtracklayer_1.8.1 tools_2.11.0      XML_3.1-1



______________________________________________________________________
The contents of this electronic message, including any attachments, are 
intended only for the use of the individual or entity to which they are 
addressed and may contain confidential information. If you are not the intended 
recipient, you are hereby notified that any use, dissemination, distribution, 
or copying of this message or any attachment is strictly prohibited. If you 
have received this transmission in error, please send an e-mail to 
postmas...@genomichealth.com and delete this message, along with any 
attachments, from your computer.
        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to