Hi Simon.
See comments below.
> I am using the mouse gene ST arrays and am having problems with
> annotation. When i write a csv file, the annotation is only the
> probeset_id, no gene names or accession numbers etc.
That's what it should be. Actually, its the 'transcript_cluster_id'.
Previously, Affy did not provide annotation at the "probeset" level.
The CDF file just contains the identifiers. Linking results (e.g.
expression summaries) to the annotation can be done with other R
packages. For example, here is some code I gave Sebastien a few weeks
ago that will get you started (just replace hugene10st.db with
mogene10st.db):
-------------
Say you have some Affy identifiers:
> myids
[1] "7950136" "7955845" "7955852" "7955855" "7955858" "7955865"
"7955869"
[8] "7955873" "7955887" "8016433"
Load package and read off the gene symbols:
> library(hugene10st.db)
> symbols <- unlist(as.list(hugene10stSYMBOL))
> data.frame(affyid = myids,symbol = symbols[myids])
affyid symbol
7950136 7950136 PHOX2A
7955845 7955845 HOXC13
7955852 7955852 HOXC12
7955855 7955855 HOXC11
7955858 7955858 HOXC10
7955865 7955865 HOXC9
7955869 7955869 HOXC8
7955873 7955873 HOXC6
7955887 7955887 HOXC5
8016433 8016433 HOXB1
Here are some other fields in hugene10st.db:
> hugene10st
hugene10st hugene10stCHRLENGTHS
hugene10stENTREZID hugene10stGO2ALLPROBES hugene10stORGANISM
hugene10stPMID2PROBE hugene10stUNIPROT
hugene10st.db:: hugene10stCHRLOC
hugene10stENZYME hugene10stGO2PROBE hugene10stPATH
hugene10stPROSITE hugene10st_dbInfo
hugene10stACCNUM hugene10stCHRLOCEND
hugene10stENZYME2PROBE hugene10stMAP hugene10stPATH2PROBE
hugene10stREFSEQ hugene10st_dbconn
hugene10stALIAS2PROBE hugene10stENSEMBL
hugene10stGENENAME hugene10stMAPCOUNTS hugene10stPFAM
hugene10stSYMBOL hugene10st_dbfile
hugene10stCHR hugene10stENSEMBL2PROBE
hugene10stGO hugene10stOMIM hugene10stPMID
hugene10stUNIGENE hugene10st_dbschema
...
-------------
> These probesets
> also do not match the probeset_ids from MoGene-1_0-st-v1.na27.mm9 off
> the affymetrix website.
Perhaps you want 'transcript_cluster_id's?
(CSV files from
http://www.affymetrix.com/products_services/arrays/specific/mousegene_1_st.affx)
> tr <- read.csv("MoGene-1_0-st-
v1.na27.mm9.transcript.csv",header=TRUE,comment.char="#")
> ps <- read.csv("MoGene-1_0-st-
v1.na27.mm9.probeset.csv",header=TRUE,comment.char="#")
> cdf <- AffymetrixCdfFile$fromChipType("MoGene-1_0-st-
v1",verbose=verbose)
> un <- getUnitNames(cdf)
> sum( un %in% ps$transcript_cluster_id )
[1] 28815
> sum( un %in% tr$transcript_cluster_id )
[1] 35474
You may also be interested in the following thread, which explains the
difference in number of probesets:
http://thread.gmane.org/gmane.science.biology.informatics.conductor/19591/
> here is my session:
>
>> library('aroma.affymetrix')
>> cdf <- AffymetrixCdfFile$byChipType("MoGene-1_0-st-v1",tags='r3')
>> cs <- AffymetrixCelSet$byName("Files", cdf=cdf)
>> bc <- RmaBackgroundCorrection(cs)
>> csBC <- process(bc,verbose=verbose)
>> qn <- QuantileNormalization(csBC, typesToUpdate="pm")
>> csN <- process(qn, verbose=verbose)
>> plm <- RmaPlm(csN)
>> fit(plm, verbose=verbose)
>> qam <- QualityAssessmentModel(plm)
>> ces <- getChipEffectSet(plm)
>> mat <- extractMatrix(ces)
>> mat <- log2(mat)
>> rownames(mat) <- getUnitNames(cdf)
>> write.csv(mat, file="data.csv")
>
> I am sure there is a simple solution to this and I apologize as I am
> new to "R". Any help would be much appreciated. Also, what are people
> opinions on the "positive" and "negative controls" probesets? Should
> these be included as part of a final gene list?
> Thank you in advance for any help.
Good question. Some people use the controls for QC and some use them
for adjusting for background (for example, the pool of GC content
probes). But, definitely if you were to follow this up with some kind
of differential expression analysis (e.g. limma), I would discard the
non-"main" probes. For example:
> table(tr$category)
control->affx control->bgp-
>antigenomic main
22
45 28815
normgene->exon normgene->intron rescue->FLmRNA-
>unmapped
1324
5222 91
Hope that helps.
Mark
------------------------------
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: [email protected]
e: [email protected]
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------
--~--~---------~--~----~------------~-------~--~----~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest
version of the package, 2) to report the output of sessionInfo() and
traceback(), and 3) to post a complete code example.
You received this message because you are subscribed to the Google Groups
"aroma.affymetrix" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/aroma-affymetrix?hl=en
-~----------~----~----~----~------~----~------~--~---