[aroma.affymetrix] Re: annotation of ST gene Arrays

Mark Robinson Tue, 10 Feb 2009 01:13:43 -0800

Hi Simon.

See comments below.


> I am using the mouse  gene ST arrays and am having problems with
> annotation. When i write a csv file, the annotation is only the
> probeset_id, no gene names or accession numbers etc.

That's what it should be.  Actually, its the 'transcript_cluster_id'.   
Previously, Affy did not provide annotation at the "probeset" level.

The CDF file just contains the identifiers.  Linking results (e.g.  
expression summaries) to the annotation can be done with other R  
packages.  For example, here is some code I gave Sebastien a few weeks  
ago that will get you started (just replace hugene10st.db with  
mogene10st.db):

-------------
Say you have some Affy identifiers:

 > myids
[1] "7950136" "7955845" "7955852" "7955855" "7955858" "7955865"  
"7955869"
[8] "7955873" "7955887" "8016433"

Load package and read off the gene symbols:

 > library(hugene10st.db)
 > symbols <- unlist(as.list(hugene10stSYMBOL))
 > data.frame(affyid = myids,symbol = symbols[myids])
         affyid symbol
7950136 7950136 PHOX2A
7955845 7955845 HOXC13
7955852 7955852 HOXC12
7955855 7955855 HOXC11
7955858 7955858 HOXC10
7955865 7955865  HOXC9
7955869 7955869  HOXC8
7955873 7955873  HOXC6
7955887 7955887  HOXC5
8016433 8016433  HOXB1

Here are some other fields in hugene10st.db:

 > hugene10st
hugene10st               hugene10stCHRLENGTHS      
hugene10stENTREZID       hugene10stGO2ALLPROBES   hugene10stORGANISM    
hugene10stPMID2PROBE     hugene10stUNIPROT
hugene10st.db::          hugene10stCHRLOC          
hugene10stENZYME         hugene10stGO2PROBE       hugene10stPATH        
hugene10stPROSITE        hugene10st_dbInfo
hugene10stACCNUM         hugene10stCHRLOCEND       
hugene10stENZYME2PROBE   hugene10stMAP            hugene10stPATH2PROBE  
hugene10stREFSEQ         hugene10st_dbconn
hugene10stALIAS2PROBE    hugene10stENSEMBL         
hugene10stGENENAME       hugene10stMAPCOUNTS      hugene10stPFAM        
hugene10stSYMBOL         hugene10st_dbfile
hugene10stCHR            hugene10stENSEMBL2PROBE   
hugene10stGO             hugene10stOMIM           hugene10stPMID        
hugene10stUNIGENE        hugene10st_dbschema

...
-------------



> These probesets
> also do not match the probeset_ids from MoGene-1_0-st-v1.na27.mm9 off
> the affymetrix website.

Perhaps you want 'transcript_cluster_id's?

(CSV files from 
http://www.affymetrix.com/products_services/arrays/specific/mousegene_1_st.affx)

 > tr <- read.csv("MoGene-1_0-st- 
v1.na27.mm9.transcript.csv",header=TRUE,comment.char="#")
 > ps <- read.csv("MoGene-1_0-st- 
v1.na27.mm9.probeset.csv",header=TRUE,comment.char="#")

 > cdf <- AffymetrixCdfFile$fromChipType("MoGene-1_0-st- 
v1",verbose=verbose)
 > un <- getUnitNames(cdf)
 > sum( un %in% ps$transcript_cluster_id )
[1] 28815
 > sum( un %in% tr$transcript_cluster_id )
[1] 35474

You may also be interested in the following thread, which explains the  
difference in number of probesets:
http://thread.gmane.org/gmane.science.biology.informatics.conductor/19591/


> here is my session:
>
>> library('aroma.affymetrix')
>> cdf <- AffymetrixCdfFile$byChipType("MoGene-1_0-st-v1",tags='r3')
>> cs <- AffymetrixCelSet$byName("Files", cdf=cdf)
>> bc <- RmaBackgroundCorrection(cs)
>> csBC <- process(bc,verbose=verbose)
>> qn <- QuantileNormalization(csBC, typesToUpdate="pm")
>> csN <- process(qn, verbose=verbose)
>> plm <- RmaPlm(csN)
>> fit(plm, verbose=verbose)
>> qam <- QualityAssessmentModel(plm)
>> ces <- getChipEffectSet(plm)
>> mat <- extractMatrix(ces)
>> mat <- log2(mat)
>> rownames(mat) <- getUnitNames(cdf)
>> write.csv(mat, file="data.csv")
>
> I am sure there is a simple solution to this and I apologize as I am
> new to "R". Any help would be much appreciated. Also, what are people
> opinions on the "positive" and "negative controls" probesets? Should
> these be included as part of a final gene list?
> Thank you in advance for any help.

Good question.  Some people use the controls for QC and some use them  
for adjusting for background (for example, the pool of GC content  
probes).  But, definitely if you were to follow this up with some kind  
of differential expression analysis (e.g. limma), I would discard the  
non-"main" probes.  For example:

 > table(tr$category)

             control->affx control->bgp- 
 >antigenomic                      main
                        22                         
45                     28815
            normgene->exon          normgene->intron  rescue->FLmRNA- 
 >unmapped
                      1324                       
5222                        91


Hope that helps.
Mark




------------------------------
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: [email protected]
e: [email protected]
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------





--~--~---------~--~----~------------~-------~--~----~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~----------~----~----~----~------~----~------~--~---

[aroma.affymetrix] Re: annotation of ST gene Arrays

Reply via email to