[aroma.affymetrix] Re: FIRMA score

sabrina Thu, 11 Dec 2008 08:20:42 -0800

Hi, Mark!
Thank you so much! After an intensive study of FIRMA and
aroma.affymetrix, I think I sort of figure out a routine for my exon
study.  And now I am more into R programming instead of C or Matlab :)
One side question I still have is the probe affinity effect. I never
used it and just wonder what is it for? Maybe I still don't fully
understand FIRMA model very well :(


Thanks!

Sabrina

On Dec 10, 5:34 pm, Mark Robinson <[EMAIL PROTECTED]> wrote:
> > I assume that you referred me to the probeset annotation csv file ,
> > not the transcript annotation csv file. :)
>
> > I did get the exon coordinates! it seemed to me that the probeset_id
> > in the probeset annotation file is equivalent to groupName of
> > exFirma.
>
> Yes!  The transcript CSV file may be useful for other things, but  
> indeed not for exon coordinates.
>
> In aroma.affymetrix terminology for exon arrays,  
> unit=transcript_cluster_id, group=probeset_id and cell=probe.
>
> > One question I have is in the exon array plot from GenomeGraphs, the
> > unrData is the probe level data, in other words, in my case, I should
> > use the normalized data, csN, is that correct?
>
> If you look closely at the Purdom paper, there are a few things you  
> may want to plot in the context of Ensembl annotation.  The normalized  
> data is the obvious one, which you have linked to an object 'csN'.
>
> In addition, you may want to plot the residuals:
>
> rs <- calculateResiduals(plmTr)  # plmTr is the fitted ExonRmaPlm with  
> mergeGroups=TRUE
> d <- extractMatrix(rs,cells=...)
>
> ... or for example you may want to plot the raw data, adjusted by the  
> probe effects.  For example, RMA fits the model
>
> Y_{ij} = a_i + b_j
>
> (Y_{ij} = normalized data, a_i = chip effects, b_j = probe effects),  
> you may be interested in plotting:
>
> Y_{ij} - \hat{b}_j
>
> ... i.e. raw data, adjusted by the estimated probe effects.
>
> > another question is graph related though. Since I have two groups
> > among my 9 samples, when I plot exon array, is there any way to use
> > different color coding for these samples on the same graph? Thanks!
>
> Short answer:
>
> library(GenomeGraphs)
> ?"ExonArray-class"
>
> Long answer:  create your ExonArray object and use DisplayPars slot to  
> specify colour and line width:
>
> ea1<-new("ExonArray", intensity = d, probeStart = ..., probeEnd=...,
>               probeId = ..., nProbes = ..., dp = DisplayPars(color =  
> col, lwd=lwd,
>               mapColor = "dodgerblue2",plotMap=TRUE))
>
> where 'col' and 'lwd' are either length 1 (in which case all lines get  
> the same width and colour) or vectors of the length of the number of  
> samples ...
>
> Thanks for all these questions Sabrina.  This will help make a nice  
> vignette ... when I get time :)
>
> Unless you'd be interested in summarizing your approach! :)
>
> Cheers,
> Mark
>
> > Sabrina
>
> > On Dec 9, 5:02 pm, Mark Robinson <[EMAIL PROTECTED]> wrote:
> >> Hi Sabrina.
>
> >> Great work!  See below.
>
> >> On 10/12/2008, at 5:27 AM, sabrina wrote:
>
> >>> Hi, Mark!
> >>> Thank you so much! I think I pretty much figured out how to get the
> >>> gene level, exon level expressions and comparison done. I checked  
> >>> the
> >>> GenomeGraphs as you suggested. I tried the code in the exon array
> >>> section, and it worked fine. So here is the question related to my
> >>> data set.
>
> >>> In order to plot exon data, I used the following code to get exon
> >>> summary (intensities) which is pbsetSummLog2:
>
> >>> plmNoMerge<- ExonRmaPlm(csN, mergeGroups=FALSE)
> >>> fit(plmNoMerge)
> >>> readUnits(plmNoMerge,units=1)
>
> >>> chpNoMerge<-getChipEffectSet(plmNoMerge)
>
> >>> pbsetSumm<-extractMatrix(chpNoMerge,returnUgcMap=TRUE)
> >>> pbsetSummLog2<-log2(pbsetSumm);
> >>> pbsetNames<-readCdfGroupNames(getPathname(getCdf(chpNoMerge)),
> >>>    unit=unique(attr(pbsetSumm,"unitGroupCellMap")[,"unit"]))
> >>> rownames(pbsetSumm)<-unlist(pbsetNames)
> >>> rownames(pbsetSummLog2)<-unlist(pbsetNames)
>
> >>> Because each significant exon that was detected by FIRMA is  
> >>> associated
> >>> with one gene, and that gene has several exons, therefore, I used  
> >>> the
> >>> following to find all exons associated to that gene:
> >>> (x is the result from topTable)
>
> >>> temp<-grep(exFirma[x$ID,1][1],exFirma[,1]);
> >>> gp_temp<-exFirma[temp,2]
>
> >>> gene1<-pbsetSummLog2[temp,]
>
> >>> which gives me:
> >>>                  array1              array 2 ...
> >>> 4308385  6.338784        6.896304
> >>> 4376965  1.973171        2.272406
>
> >>> I know that "430835" is the groupName (aka exon id), but I am not  
> >>> sure
> >>> where to find the start and end position of these individual exons,
> >>> can you show me how to do it?
>
> >>> The reason I asked this is because in makeExonArray, I need the
> >>> probeStart and probeEnd positions. Thanks!!!
>
> >> Well, you can get the probeset start and end positions from the
> >> 'NetAffx Annotation Files' from Affymetrix.  For human, this would be
> >> at:
>
> >>http://www.affymetrix.com/products_services/arrays/specific/
> >> exon.affx...
>
> >> I think you said you were using mouse, so you can find this at:
>
> >>http://www.affymetrix.com/products_services/arrays/specific/
> >> mouse_exo...
>
> >> Find the 'Current NetAffx Annotation Files' section and download the
> >> csv.zip file.  Just to have a quick peak at a few columns of this
> >> file, if I run a unix tool called awk on the CSV file you get:
>
> >> --------
> >> awk '{FS=","; print $7,$1,$2,$3,$4,$5,$16}' HuEx-1_0-st-
> >> v2.na25.hg18.probeset.csv | grep -v "^ " | more
> >> --------
>
> >> "transcript_cluster_id" "probeset_id" "seqname" "strand" "start"
> >> "stop" "level"
> >> ...
> >> "2315373" "2315374" "chr1" "+" "742655" "742719" "core"
> >> "2315373" "2315375" "chr1" "+" "742869" "743231" "core"
> >> "2315373" "2315376" "chr1" "+" "743293" "743434" "core"
> >> "2315373" "2315377" "chr1" "+" "744094" "744979" "core"
> >> "2315380" "2315381" "chr1" "+" "752897" "752974" "extended"
> >> "2315380" "2315382" "chr1" "+" "754246" "754274" "extended"
> >> ...
>
> >> so there you have everything you need -- identifiers and chromosome
> >> locations.  You can just use 'read.csv' to read it into R and pull  
> >> off
> >> the coordinates you need.
>
> >> Cheers,
> >> Mark
>
> >>> Sabrina
> >>> On Dec 8, 4:54 pm, Mark Robinson <[EMAIL PROTECTED]> wrote:
> >>>> Hi Sabrina.
>
> >>>> On 08/12/2008, at 3:25 PM, sabrina wrote:
>
> >>>>> Hi, Mark:
> >>>>> Thanks for the help! I think now I have better understanding of
> >>>>> LIMMA
> >>>>> now after I read the related chapter of LIMMA user's guide :)
>
> >>>>> On Dec 4, 4:10 pm, "Mark Robinson" <[EMAIL PROTECTED]> wrote:
> >>>>>> Sabrina.
>
> >>>>>> When asking questions like this, please post your full set of
> >>>>>> commands.
> >>>>>> Its hard to decode what variables have been set without this. :)
>
> >>>>>> I'm guessing that affyAnnotation is what came from the 'read.csv'
> >>>>>> call and
> >>>>>> x2 is the output of topTable?  Have you set the rownames of
> >>>>>> exFirma?
>
> >>>>> x2 is the output of topTable
> >>>>> I do not know how to set rownames of exFirma, when I check exFirma
> >>>>> [1:3,1:6], it gave me the following:
>
> >>>>> unitName groupName unit group cell    Array1
> >>>>> 1  6838637   4304927    1     1    1  1.0372756
> >>>>> 2  6838637   4330595    1     2    2 -1.1679380
> >>>>> 3  6838637   4356771    1     3    3  1.2217411
>
> >>>>> Can you show me how to set the rownames? where can I find the row
> >>>>> names? Is it correct that groupName is the exon id ?
>
> >>>> A good place to start:
>
> >>>> ?rownames
>
> >>>>> I used the following code to import affymetrix annotation file:
>
> >>>>> affyAnnotation<- read.csv(file="MoEx-1_0-st-
> >>>>> v1.na27.mm9.transcript.csv",head=TRUE,sep=",",skip=20);
>
> >>>>> and
>
> >>>>>> Is this just a typo then?  It should be
> >>>>>> affyAnnotation$transcript_cluster_id instead of
> >>>>>> affyAnnotation$transcript_cluster_Id.
>
> >>>>>> What does exFirma[x2$ID,1] actually give you?  If the
> >>>>>> rownames(exFirma)
> >>>>>> are not set, this could mislead you.
>
> >>>>> I used the suggestion you gave to me to redo the lmFit, and here  
> >>>>> is
> >>>>> what I used :
>
> >>>>> mm <- model.matrix(~cls)
> >>>>> fit <- lmFit(exFirma[,6:ncol(exFirma)], mm)
> >>>>> fit<-eBayes(fit)
> >>>>> x<-topTable(fit,coef=2,adj="fdr")
>
> >>>>> Then I used exFirma[x$ID,1], it gave me:
> >>>>> [1] "6996667" "6965348" "6982181" "6917790" "6899766" "7005797"
> >>>>> "6880577" "6962054" "6897349"
> >>>>> [10] "7018774"
>
> >>>>> when I printed x, it gave me:
>
> >>>>>           ID     logFC         t      P.Value adj.P.Val         B
> >>>>> 11253   11253 -4.856783 -8.524568 2.223806e-06 0.4982258 -4.323317
> >>>>> 183160 183160  4.058556  7.676406 6.413325e-06 0.7184271 -4.332955
> >>>>> 32235   32235 -3.692429 -6.555475 2.963704e-05 1.0000000 -4.350176
> >>>>> 132610 132610  2.315536  6.508698 3.170265e-05 1.0000000 -4.351036
> >>>>> 165279 165279  3.269404  6.311540 4.225007e-05 1.0000000 -4.354809
> >>>>> 171777 171777  3.845620  6.174738 5.172731e-05 1.0000000 -4.357574
> >>>>> 45380   45380  1.971388  6.088251 5.886544e-05 1.0000000 -4.359388
> >>>>> 132873 132873 -4.569042 -6.021868 6.505114e-05 1.0000000 -4.360815
> >>>>> 86037   86037  3.206732  5.890372 7.943255e-05 1.0000000 -4.363738
> >>>>> 27761   27761 -2.651559 -5.825434 8.774484e-05 1.0000000 -4.365229
>
> >>>>> when I typed exFirma[11253,], it gave me:
>
> >>>>> exFirma[11253,]
> >>>>>      unitName groupName unit group  cell   array1, ...
> >>>>> 11253  6996667   5092691  535     6 11253 2.718783 ...
>
> >>>>> so it appeared to me that x$ID matches the index of exFirma entry.
>
> >>>>> Then I used;
> >>>>> index<-match(exFirma[x$ID,1], affyAnnotation
> >>>>> $transcript_cluster_id)
> >>>>> to find the index of these entries (significant different between
> >>>>> two
> >>>>> groups) in Affy annotation, then I checked the entries in Affy
> >>>>> Annotation file:
> >>>>> affyAnnotation[index, 1:5]
>
> >>>>>    transcript_cluster_id probeset_id      seqname strand     start
> >>>>> 71605               6996667     6996667         chr9      -
> >>>>> 70631116
> >>>>> 62004               6965348     6965348         chr7      +
> >>>>> 151009070
> >>>>> 67351               6982181     6982181         chr8      -
> >>>>> 47660950
> >>>>> ...
>
> >>>> This all looks good to me.
>
> >>>>> But then I don't know how to check whether
>
> ...
>
> read more »
--~--~---------~--~----~------------~-------~--~----~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~----------~----~----~----~------~----~------~--~---

[aroma.affymetrix] Re: FIRMA score

Reply via email to