[aroma.affymetrix] Re: FIRMA score

sabrina Wed, 10 Dec 2008 13:01:53 -0800

Thanks! Mark:
I assume that you referred me to the probeset annotation csv file ,
not the transcript annotation csv file. :)
I did get the exon coordinates! it seemed to me that the probeset_id
in the probeset annotation file is equivalent to groupName of
exFirma.


One question I have is in the exon array plot from GenomeGraphs, the
unrData is the probe level data, in other words, in my case, I should
use the normalized data, csN, is that correct?

another question is graph related though. Since I have two groups
among my 9 samples, when I plot exon array, is there any way to use
different color coding for these samples on the same graph? Thanks!

Sabrina

On Dec 9, 5:02 pm, Mark Robinson <[EMAIL PROTECTED]> wrote:
> Hi Sabrina.
>
> Great work!  See below.
>
> On 10/12/2008, at 5:27 AM, sabrina wrote:
>
>
>
>
>
> > Hi, Mark!
> > Thank you so much! I think I pretty much figured out how to get the
> > gene level, exon level expressions and comparison done. I checked the
> > GenomeGraphs as you suggested. I tried the code in the exon array
> > section, and it worked fine. So here is the question related to my
> > data set.
>
> > In order to plot exon data, I used the following code to get exon
> > summary (intensities) which is pbsetSummLog2:
>
> > plmNoMerge<- ExonRmaPlm(csN, mergeGroups=FALSE)
> > fit(plmNoMerge)
> > readUnits(plmNoMerge,units=1)
>
> > chpNoMerge<-getChipEffectSet(plmNoMerge)
>
> > pbsetSumm<-extractMatrix(chpNoMerge,returnUgcMap=TRUE)
> > pbsetSummLog2<-log2(pbsetSumm);
> > pbsetNames<-readCdfGroupNames(getPathname(getCdf(chpNoMerge)),
> >    unit=unique(attr(pbsetSumm,"unitGroupCellMap")[,"unit"]))
> > rownames(pbsetSumm)<-unlist(pbsetNames)
> > rownames(pbsetSummLog2)<-unlist(pbsetNames)
>
> > Because each significant exon that was detected by FIRMA is associated
> > with one gene, and that gene has several exons, therefore, I used the
> > following to find all exons associated to that gene:
> > (x is the result from topTable)
>
> > temp<-grep(exFirma[x$ID,1][1],exFirma[,1]);
> > gp_temp<-exFirma[temp,2]
>
> > gene1<-pbsetSummLog2[temp,]
>
> > which gives me:
> >                  array1              array 2 ...
> > 4308385  6.338784        6.896304
> > 4376965  1.973171        2.272406
>
> > I know that "430835" is the groupName (aka exon id), but I am not sure
> > where to find the start and end position of these individual exons,
> > can you show me how to do it?
>
> > The reason I asked this is because in makeExonArray, I need the
> > probeStart and probeEnd positions. Thanks!!!
>
> Well, you can get the probeset start and end positions from the  
> 'NetAffx Annotation Files' from Affymetrix.  For human, this would be  
> at:
>
> http://www.affymetrix.com/products_services/arrays/specific/exon.affx...
>
> I think you said you were using mouse, so you can find this at:
>
> http://www.affymetrix.com/products_services/arrays/specific/mouse_exo...
>
> Find the 'Current NetAffx Annotation Files' section and download the  
> csv.zip file.  Just to have a quick peak at a few columns of this  
> file, if I run a unix tool called awk on the CSV file you get:
>
> --------
> awk '{FS=","; print $7,$1,$2,$3,$4,$5,$16}' HuEx-1_0-st-
> v2.na25.hg18.probeset.csv | grep -v "^ " | more
> --------
>
> "transcript_cluster_id" "probeset_id" "seqname" "strand" "start"  
> "stop" "level"
> ...
> "2315373" "2315374" "chr1" "+" "742655" "742719" "core"
> "2315373" "2315375" "chr1" "+" "742869" "743231" "core"
> "2315373" "2315376" "chr1" "+" "743293" "743434" "core"
> "2315373" "2315377" "chr1" "+" "744094" "744979" "core"
> "2315380" "2315381" "chr1" "+" "752897" "752974" "extended"
> "2315380" "2315382" "chr1" "+" "754246" "754274" "extended"
> ...
>
> so there you have everything you need -- identifiers and chromosome  
> locations.  You can just use 'read.csv' to read it into R and pull off  
> the coordinates you need.
>
> Cheers,
> Mark
>
>
>
> > Sabrina
> > On Dec 8, 4:54 pm, Mark Robinson <[EMAIL PROTECTED]> wrote:
> >> Hi Sabrina.
>
> >> On 08/12/2008, at 3:25 PM, sabrina wrote:
>
> >>> Hi, Mark:
> >>> Thanks for the help! I think now I have better understanding of  
> >>> LIMMA
> >>> now after I read the related chapter of LIMMA user's guide :)
>
> >>> On Dec 4, 4:10 pm, "Mark Robinson" <[EMAIL PROTECTED]> wrote:
> >>>> Sabrina.
>
> >>>> When asking questions like this, please post your full set of
> >>>> commands.
> >>>> Its hard to decode what variables have been set without this. :)
>
> >>>> I'm guessing that affyAnnotation is what came from the 'read.csv'
> >>>> call and
> >>>> x2 is the output of topTable?  Have you set the rownames of  
> >>>> exFirma?
>
> >>> x2 is the output of topTable
> >>> I do not know how to set rownames of exFirma, when I check exFirma
> >>> [1:3,1:6], it gave me the following:
>
> >>> unitName groupName unit group cell    Array1
> >>> 1  6838637   4304927    1     1    1  1.0372756
> >>> 2  6838637   4330595    1     2    2 -1.1679380
> >>> 3  6838637   4356771    1     3    3  1.2217411
>
> >>> Can you show me how to set the rownames? where can I find the row
> >>> names? Is it correct that groupName is the exon id ?
>
> >> A good place to start:
>
> >> ?rownames
>
> >>> I used the following code to import affymetrix annotation file:
>
> >>> affyAnnotation<- read.csv(file="MoEx-1_0-st-
> >>> v1.na27.mm9.transcript.csv",head=TRUE,sep=",",skip=20);
>
> >>> and
>
> >>>> Is this just a typo then?  It should be
> >>>> affyAnnotation$transcript_cluster_id instead of
> >>>> affyAnnotation$transcript_cluster_Id.
>
> >>>> What does exFirma[x2$ID,1] actually give you?  If the
> >>>> rownames(exFirma)
> >>>> are not set, this could mislead you.
>
> >>> I used the suggestion you gave to me to redo the lmFit, and here is
> >>> what I used :
>
> >>> mm <- model.matrix(~cls)
> >>> fit <- lmFit(exFirma[,6:ncol(exFirma)], mm)
> >>> fit<-eBayes(fit)
> >>> x<-topTable(fit,coef=2,adj="fdr")
>
> >>> Then I used exFirma[x$ID,1], it gave me:
> >>> [1] "6996667" "6965348" "6982181" "6917790" "6899766" "7005797"
> >>> "6880577" "6962054" "6897349"
> >>> [10] "7018774"
>
> >>> when I printed x, it gave me:
>
> >>>           ID     logFC         t      P.Value adj.P.Val         B
> >>> 11253   11253 -4.856783 -8.524568 2.223806e-06 0.4982258 -4.323317
> >>> 183160 183160  4.058556  7.676406 6.413325e-06 0.7184271 -4.332955
> >>> 32235   32235 -3.692429 -6.555475 2.963704e-05 1.0000000 -4.350176
> >>> 132610 132610  2.315536  6.508698 3.170265e-05 1.0000000 -4.351036
> >>> 165279 165279  3.269404  6.311540 4.225007e-05 1.0000000 -4.354809
> >>> 171777 171777  3.845620  6.174738 5.172731e-05 1.0000000 -4.357574
> >>> 45380   45380  1.971388  6.088251 5.886544e-05 1.0000000 -4.359388
> >>> 132873 132873 -4.569042 -6.021868 6.505114e-05 1.0000000 -4.360815
> >>> 86037   86037  3.206732  5.890372 7.943255e-05 1.0000000 -4.363738
> >>> 27761   27761 -2.651559 -5.825434 8.774484e-05 1.0000000 -4.365229
>
> >>> when I typed exFirma[11253,], it gave me:
>
> >>> exFirma[11253,]
> >>>      unitName groupName unit group  cell   array1, ...
> >>> 11253  6996667   5092691  535     6 11253 2.718783 ...
>
> >>> so it appeared to me that x$ID matches the index of exFirma entry.
>
> >>> Then I used;
> >>> index<-match(exFirma[x$ID,1], affyAnnotation$transcript_cluster_id)
> >>> to find the index of these entries (significant different between  
> >>> two
> >>> groups) in Affy annotation, then I checked the entries in Affy
> >>> Annotation file:
> >>> affyAnnotation[index, 1:5]
>
> >>>    transcript_cluster_id probeset_id      seqname strand     start
> >>> 71605               6996667     6996667         chr9      -  
> >>> 70631116
> >>> 62004               6965348     6965348         chr7      +  
> >>> 151009070
> >>> 67351               6982181     6982181         chr8      -  
> >>> 47660950
> >>> ...
>
> >> This all looks good to me.
>
> >>> But then I don't know how to check whether it is correct or not , or
> >>> how to find and plot corresponding probeset (exon ) as what was  
> >>> shown
> >>> in the FIRMA paper supplementary . Can you give me some hints?  
> >>> Thanks!
>
> >> Elizabeth made the plots in the paper and has some code for it.  For
> >> starters, familiarize yourself with the 'GenomeGraphs' package --  
> >> note
> >> that in the GenomeGraphs vignette, it has an exon array example.  I
> >> have a vignette for this, but haven't finished it yet.
>
> >> HTH,
> >> Mark
>
> >>> Sabrina
>
> >>>> Cheers,
> >>>> Mark
>
> >>>>> Ok, I just used the following line to read Affy Annotation
>
> >>>>>  read.csv(file="MoEx-1_0-st-
> >>>>> v1.na27.mm9.transcript.csv",head=TRUE,sep=",",skip=20)
>
> >>>>> then I used
> >>>>>  match(exFirma[x2$ID,1],affyAnnotation$transcript_cluster_Id)
>
> >>>>> but I only got NAs
>
> >>>>> I can not figure out what went wrong because I checked the first  
> >>>>> ID,
> >>>>> it is in Affy annotation file. Can you point out what I did
> >>>>> incorrectly? Thanks !
>
> >>>>> Sabrina
>
> >>>>> On Dec 3, 4:53 pm, Mark Robinson <[EMAIL PROTECTED]> wrote:
> >>>>>> Sabrina.
>
> >>>>>> See below.
>
> >>>>>> On 04/12/2008, at 8:41 AM, sabrina wrote:
>
> >>>>>>> Hi, Mark:
>
> >>>>>>> Thank you so much for the detailed explanation!
> >>>>>>> Actually I am working on Mouse Exon arrays.  I did check the  
> >>>>>>> UCSC
> >>>>>>> genome browser, but they do not have the Exon Array available  
> >>>>>>> as
> >>>>>>> Human Exon.
>
> >>>>>> Right! Good point.
>
> >>>>>>> My study is trying to find alternative splicing events between  
> >>>>>>> two
> >>>>>>> groups with identical genotypes but different treatments. each
> >>>>>>> group
> >>>>>>> has 4 or 5 biological replicates. So I did all as described in  
> >>>>>>> the
> >>>>>>> examples from Human Exon array, then used the following codes to
> >>>>>>> find
> >>>>>>> the significant genes with potential splicing events:
>
> >>>>>>> ...
> >>>>>>> exFirma<-extractDataFrame(firmaScore,addNames=TRUE);
> >>>>>>> exFirma[,6:ncol(exFirma)]<-log2(exFirma[,6:ncol(exFirma)])
>
> >>>>>>> library(limma)
>
> >>>>>>> design <-
> >>>>>>> cbind(Grp1=c(1,1,1,1,1,0,0,0,0),Grp2=c(0,0,0,0,0,1,1,1,1))
>
> >>>>>>> fit<-lmFit(exFirma[,6:ncol(exFirma)],design)
> >>>>>>> fit<-eBayes(fit)
> >>>>>>> x<-topTable(fit)
>
> >>>>>> Careful here as to what you are fitting and testing with limma.
> >>>>>> You
> >>>>>> are fitting a parameterization that would require a contrast to
> >>>>>> be fit
> >>>>>> ( ... such as 'contrasts.fit' ... ) in order to assess the
> >>>>>> differences
> >>>>>> between your 2 groups.  Did you get this example from a web page
> >>>>>> or a
> >>>>>> thread somewhere?  If so, I should correct it.
>
> >>>>>> Probably what you want is:
>
> >>>>>> design <- cbind(Grp1=1,Grp2=c(0,0,0,0,0,1,1,1,1))
>
> ...
>
> read more »
--~--~---------~--~----~------------~-------~--~----~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~----------~----~----~----~------~----~------~--~---

[aroma.affymetrix] Re: FIRMA score

Reply via email to