Hi Wade. I think the problem lies with the 'ragene10st*probeset*.db' library.
How about trying the symbols from the 'ragene10sttranscriptcluster.db' package: http://www.bioconductor.org/packages/release/data/annotation/html/ragene10sttranscriptcluster.db.html I can't remember when this change was made, but my 'hugene10st.db' example is now outdated. You should use hugene10stprobeset.db for probesets or hugene10sttranscriptcluster.db for transcript clusters. Hope that helps. Cheers, Mark On 17-Dec-09, at 10:09 AM, Wade D wrote: > Hi Mark and others, > I am in a somewhat similar as the original person who started this > discussion, so I am tacking on my question to your response from > February. > > This is my first ST analysis, and I am using the Rat gene 1.0 ST. I > followed the example given at > > http://groups.google.com/group/aroma-affymetrix/web/gene-1-0-st-array-analysis > and everything has worked fine so far. > > Now, I would like to annotate my gene-level summaries. I tried using > methods I typically do (from the annotate package) with > ragene10stprobeset.db, but things didn't seem right. So I figured it > was me, and I came back the group help pages and found your post. > Mimicking it below, it seems that I've either done something wrong, or > there is a problem with ragene10stprobeset.db. > >> library(ragene10stprobeset.db) >> symbols <- unlist(as.list(ragene10stprobesetSYMBOL)) >> myids<-gExprs[,1] >> head(myids) > [1] "10700001" "10700003" "10700004" "10700005" "10700013" "10700014" >> >> temp<-data.frame(affyid = myids,symbol = symbols[myids]) >> #temp[!is.na(temp$symbol),] >> >> sum(!is.na(temp$symbol)) > [1] 237 > > This is a disturbingly low number, so I figure something is amiss. > Following your lead, I compare the CDF with what is on Affy's website > in the transcript and probeset files... > >> tr <- read.csv("RaGene-1_0-st- >> v1.na30.1.rn4.transcript.csv",header=TRUE,comment.char="#") >> ps <- read.csv("RaGene-1_0-st- >> v1.na30.rn4.probeset.csv",header=TRUE,comment.char="#") >> #chipType <- "RaGene-1_0-st-v1" >> #cdf <- AffymetrixCdfFile$byChipType(chipType, tags="r3") >> un <- getUnitNames(cdf) >> sum( un %in% ps$transcript_cluster_id ) > [1] 27342 >> >> sum( un %in% tr$transcript_cluster_id ) > [1] 29169 > > Everything looks reasonable here. > >> sum(names(symbols) %in% ps$transcript_cluster_id ) > [1] 0 >> sum(names(symbols) %in% tr$transcript_cluster_id ) > [1] 1872 > > This is the problem it seems. > > I wanted to ask others before I build my own annotation.db for > ragene10st. I've done it for Illumina arrays before, but it has been > awhile, and it is a little bit of a pain for Windows users to do. Just > wanted to get a second opinion before I go down that road, especially > since this is my first time dealing with ST arrays. > > Thanks, > Wade > > > > > On Feb 10, 3:13 am, Mark Robinson <mrobin...@wehi.edu.au> wrote: >> Hi Simon. >> >> See comments below. >> >>> I am using the mouse gene ST arrays and am having problems with >>> annotation. When i write a csv file, theannotationis only the >>> probeset_id, no gene names or accession numbers etc. >> >> That's what it should be. Actually, its the 'transcript_cluster_id'. >> Previously, Affy did not provideannotationat the "probeset" level. >> >> The CDF file just contains the identifiers. Linking results (e.g. >> expression summaries) to theannotationcan be done with other R >> packages. For example, here is some code I gave Sebastien a few >> weeks >> ago that will get you started (just replace hugene10st.db with >> mogene10st.db): >> >> ------------- >> Say you have some Affy identifiers: >> >> > myids >> [1] "7950136" "7955845" "7955852" "7955855" "7955858" "7955865" >> "7955869" >> [8] "7955873" "7955887" "8016433" >> >> Load package and read off the gene symbols: >> >> > library(hugene10st.db) >> > symbols <- unlist(as.list(hugene10stSYMBOL)) >> > data.frame(affyid = myids,symbol = symbols[myids]) >> affyid symbol >> 7950136 7950136 PHOX2A >> 7955845 7955845 HOXC13 >> 7955852 7955852 HOXC12 >> 7955855 7955855 HOXC11 >> 7955858 7955858 HOXC10 >> 7955865 7955865 HOXC9 >> 7955869 7955869 HOXC8 >> 7955873 7955873 HOXC6 >> 7955887 7955887 HOXC5 >> 8016433 8016433 HOXB1 >> >> Here are some other fields in hugene10st.db: >> >> > hugene10st >> hugene10st hugene10stCHRLENGTHS >> hugene10stENTREZID hugene10stGO2ALLPROBES hugene10stORGANISM >> hugene10stPMID2PROBE hugene10stUNIPROT >> hugene10st.db:: hugene10stCHRLOC >> hugene10stENZYME hugene10stGO2PROBE hugene10stPATH >> hugene10stPROSITE hugene10st_dbInfo >> hugene10stACCNUM hugene10stCHRLOCEND >> hugene10stENZYME2PROBE hugene10stMAP >> hugene10stPATH2PROBE >> hugene10stREFSEQ hugene10st_dbconn >> hugene10stALIAS2PROBE hugene10stENSEMBL >> hugene10stGENENAME hugene10stMAPCOUNTS hugene10stPFAM >> hugene10stSYMBOL hugene10st_dbfile >> hugene10stCHR hugene10stENSEMBL2PROBE >> hugene10stGO hugene10stOMIM hugene10stPMID >> hugene10stUNIGENE hugene10st_dbschema >> >> ... >> ------------- >> >>> These probesets >>> also do not match the probeset_ids from MoGene-1_0-st-v1.na27.mm9 >>> off >>> the affymetrix website. >> >> Perhaps you want 'transcript_cluster_id's? >> >> (CSV files >> fromhttp://www.affymetrix.com/products_services/arrays/specific/mousegene...) >> >> > tr <- read.csv("MoGene-1_0-st- >> v1.na27.mm9.transcript.csv",header=TRUE,comment.char="#") >> > ps <- read.csv("MoGene-1_0-st- >> v1.na27.mm9.probeset.csv",header=TRUE,comment.char="#") >> >> > cdf <- AffymetrixCdfFile$fromChipType("MoGene-1_0-st- >> v1",verbose=verbose) >> > un <- getUnitNames(cdf) >> > sum( un %in% ps$transcript_cluster_id ) >> [1] 28815 >> > sum( un %in% tr$transcript_cluster_id ) >> [1] 35474 >> >> You may also be interested in the following thread, which explains >> the >> difference in number of >> probesets:http://thread.gmane.org/gmane.science.biology.informatics.conductor/1 >> >> ... >> >> >> >>> here is my session: >> >>>> library('aroma.affymetrix') >>>> cdf <- AffymetrixCdfFile$byChipType("MoGene-1_0-st-v1",tags='r3') >>>> cs <- AffymetrixCelSet$byName("Files", cdf=cdf) >>>> bc <- RmaBackgroundCorrection(cs) >>>> csBC <- process(bc,verbose=verbose) >>>> qn <- QuantileNormalization(csBC, typesToUpdate="pm") >>>> csN <- process(qn, verbose=verbose) >>>> plm <- RmaPlm(csN) >>>> fit(plm, verbose=verbose) >>>> qam <- QualityAssessmentModel(plm) >>>> ces <- getChipEffectSet(plm) >>>> mat <- extractMatrix(ces) >>>> mat <- log2(mat) >>>> rownames(mat) <- getUnitNames(cdf) >>>> write.csv(mat, file="data.csv") >> >>> I am sure there is a simple solution to this and I apologize as I am >>> new to "R". Any help would be much appreciated. Also, what are >>> people >>> opinions on the "positive" and "negative controls" probesets? Should >>> these be included as part of a final gene list? >>> Thank you in advance for any help. >> >> Good question. Some people use the controls for QC and some use them >> for adjusting for background (for example, the pool of GC content >> probes). But, definitely if you were to follow this up with some >> kind >> of differential expression analysis (e.g. limma), I would discard the >> non-"main" probes. For example: >> >> > table(tr$category) >> >> control->affx control->bgp- >> >antigenomic main >> 22 >> 45 28815 >> normgene->exon normgene->intron rescue->FLmRNA- >> >unmapped >> 1324 >> 5222 91 >> >> Hope that helps. >> Mark >> >> ------------------------------ >> Mark Robinson >> Epigenetics Laboratory, Garvan >> Bioinformatics Division, WEHI >> e: m.robin...@garvan.org.au >> e: mrobin...@wehi.edu.au >> p: +61 (0)3 9345 2628 >> f: +61 (0)3 9347 0852 >> ------------------------------ ------------------------------ Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 ------------------------------ ______________________________________________________________________ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. ______________________________________________________________________ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en