Thanks! Mark: I assume that you referred me to the probeset annotation csv file , not the transcript annotation csv file. :) I did get the exon coordinates! it seemed to me that the probeset_id in the probeset annotation file is equivalent to groupName of exFirma.
One question I have is in the exon array plot from GenomeGraphs, the unrData is the probe level data, in other words, in my case, I should use the normalized data, csN, is that correct? another question is graph related though. Since I have two groups among my 9 samples, when I plot exon array, is there any way to use different color coding for these samples on the same graph? Thanks! Sabrina On Dec 9, 5:02 pm, Mark Robinson <[EMAIL PROTECTED]> wrote: > Hi Sabrina. > > Great work! See below. > > On 10/12/2008, at 5:27 AM, sabrina wrote: > > > > > > > Hi, Mark! > > Thank you so much! I think I pretty much figured out how to get the > > gene level, exon level expressions and comparison done. I checked the > > GenomeGraphs as you suggested. I tried the code in the exon array > > section, and it worked fine. So here is the question related to my > > data set. > > > In order to plot exon data, I used the following code to get exon > > summary (intensities) which is pbsetSummLog2: > > > plmNoMerge<- ExonRmaPlm(csN, mergeGroups=FALSE) > > fit(plmNoMerge) > > readUnits(plmNoMerge,units=1) > > > chpNoMerge<-getChipEffectSet(plmNoMerge) > > > pbsetSumm<-extractMatrix(chpNoMerge,returnUgcMap=TRUE) > > pbsetSummLog2<-log2(pbsetSumm); > > pbsetNames<-readCdfGroupNames(getPathname(getCdf(chpNoMerge)), > > unit=unique(attr(pbsetSumm,"unitGroupCellMap")[,"unit"])) > > rownames(pbsetSumm)<-unlist(pbsetNames) > > rownames(pbsetSummLog2)<-unlist(pbsetNames) > > > Because each significant exon that was detected by FIRMA is associated > > with one gene, and that gene has several exons, therefore, I used the > > following to find all exons associated to that gene: > > (x is the result from topTable) > > > temp<-grep(exFirma[x$ID,1][1],exFirma[,1]); > > gp_temp<-exFirma[temp,2] > > > gene1<-pbsetSummLog2[temp,] > > > which gives me: > > array1 array 2 ... > > 4308385 6.338784 6.896304 > > 4376965 1.973171 2.272406 > > > I know that "430835" is the groupName (aka exon id), but I am not sure > > where to find the start and end position of these individual exons, > > can you show me how to do it? > > > The reason I asked this is because in makeExonArray, I need the > > probeStart and probeEnd positions. Thanks!!! > > Well, you can get the probeset start and end positions from the > 'NetAffx Annotation Files' from Affymetrix. For human, this would be > at: > > http://www.affymetrix.com/products_services/arrays/specific/exon.affx... > > I think you said you were using mouse, so you can find this at: > > http://www.affymetrix.com/products_services/arrays/specific/mouse_exo... > > Find the 'Current NetAffx Annotation Files' section and download the > csv.zip file. Just to have a quick peak at a few columns of this > file, if I run a unix tool called awk on the CSV file you get: > > -------- > awk '{FS=","; print $7,$1,$2,$3,$4,$5,$16}' HuEx-1_0-st- > v2.na25.hg18.probeset.csv | grep -v "^ " | more > -------- > > "transcript_cluster_id" "probeset_id" "seqname" "strand" "start" > "stop" "level" > ... > "2315373" "2315374" "chr1" "+" "742655" "742719" "core" > "2315373" "2315375" "chr1" "+" "742869" "743231" "core" > "2315373" "2315376" "chr1" "+" "743293" "743434" "core" > "2315373" "2315377" "chr1" "+" "744094" "744979" "core" > "2315380" "2315381" "chr1" "+" "752897" "752974" "extended" > "2315380" "2315382" "chr1" "+" "754246" "754274" "extended" > ... > > so there you have everything you need -- identifiers and chromosome > locations. You can just use 'read.csv' to read it into R and pull off > the coordinates you need. > > Cheers, > Mark > > > > > Sabrina > > On Dec 8, 4:54 pm, Mark Robinson <[EMAIL PROTECTED]> wrote: > >> Hi Sabrina. > > >> On 08/12/2008, at 3:25 PM, sabrina wrote: > > >>> Hi, Mark: > >>> Thanks for the help! I think now I have better understanding of > >>> LIMMA > >>> now after I read the related chapter of LIMMA user's guide :) > > >>> On Dec 4, 4:10 pm, "Mark Robinson" <[EMAIL PROTECTED]> wrote: > >>>> Sabrina. > > >>>> When asking questions like this, please post your full set of > >>>> commands. > >>>> Its hard to decode what variables have been set without this. :) > > >>>> I'm guessing that affyAnnotation is what came from the 'read.csv' > >>>> call and > >>>> x2 is the output of topTable? Have you set the rownames of > >>>> exFirma? > > >>> x2 is the output of topTable > >>> I do not know how to set rownames of exFirma, when I check exFirma > >>> [1:3,1:6], it gave me the following: > > >>> unitName groupName unit group cell Array1 > >>> 1 6838637 4304927 1 1 1 1.0372756 > >>> 2 6838637 4330595 1 2 2 -1.1679380 > >>> 3 6838637 4356771 1 3 3 1.2217411 > > >>> Can you show me how to set the rownames? where can I find the row > >>> names? Is it correct that groupName is the exon id ? > > >> A good place to start: > > >> ?rownames > > >>> I used the following code to import affymetrix annotation file: > > >>> affyAnnotation<- read.csv(file="MoEx-1_0-st- > >>> v1.na27.mm9.transcript.csv",head=TRUE,sep=",",skip=20); > > >>> and > > >>>> Is this just a typo then? It should be > >>>> affyAnnotation$transcript_cluster_id instead of > >>>> affyAnnotation$transcript_cluster_Id. > > >>>> What does exFirma[x2$ID,1] actually give you? If the > >>>> rownames(exFirma) > >>>> are not set, this could mislead you. > > >>> I used the suggestion you gave to me to redo the lmFit, and here is > >>> what I used : > > >>> mm <- model.matrix(~cls) > >>> fit <- lmFit(exFirma[,6:ncol(exFirma)], mm) > >>> fit<-eBayes(fit) > >>> x<-topTable(fit,coef=2,adj="fdr") > > >>> Then I used exFirma[x$ID,1], it gave me: > >>> [1] "6996667" "6965348" "6982181" "6917790" "6899766" "7005797" > >>> "6880577" "6962054" "6897349" > >>> [10] "7018774" > > >>> when I printed x, it gave me: > > >>> ID logFC t P.Value adj.P.Val B > >>> 11253 11253 -4.856783 -8.524568 2.223806e-06 0.4982258 -4.323317 > >>> 183160 183160 4.058556 7.676406 6.413325e-06 0.7184271 -4.332955 > >>> 32235 32235 -3.692429 -6.555475 2.963704e-05 1.0000000 -4.350176 > >>> 132610 132610 2.315536 6.508698 3.170265e-05 1.0000000 -4.351036 > >>> 165279 165279 3.269404 6.311540 4.225007e-05 1.0000000 -4.354809 > >>> 171777 171777 3.845620 6.174738 5.172731e-05 1.0000000 -4.357574 > >>> 45380 45380 1.971388 6.088251 5.886544e-05 1.0000000 -4.359388 > >>> 132873 132873 -4.569042 -6.021868 6.505114e-05 1.0000000 -4.360815 > >>> 86037 86037 3.206732 5.890372 7.943255e-05 1.0000000 -4.363738 > >>> 27761 27761 -2.651559 -5.825434 8.774484e-05 1.0000000 -4.365229 > > >>> when I typed exFirma[11253,], it gave me: > > >>> exFirma[11253,] > >>> unitName groupName unit group cell array1, ... > >>> 11253 6996667 5092691 535 6 11253 2.718783 ... > > >>> so it appeared to me that x$ID matches the index of exFirma entry. > > >>> Then I used; > >>> index<-match(exFirma[x$ID,1], affyAnnotation$transcript_cluster_id) > >>> to find the index of these entries (significant different between > >>> two > >>> groups) in Affy annotation, then I checked the entries in Affy > >>> Annotation file: > >>> affyAnnotation[index, 1:5] > > >>> transcript_cluster_id probeset_id seqname strand start > >>> 71605 6996667 6996667 chr9 - > >>> 70631116 > >>> 62004 6965348 6965348 chr7 + > >>> 151009070 > >>> 67351 6982181 6982181 chr8 - > >>> 47660950 > >>> ... > > >> This all looks good to me. > > >>> But then I don't know how to check whether it is correct or not , or > >>> how to find and plot corresponding probeset (exon ) as what was > >>> shown > >>> in the FIRMA paper supplementary . Can you give me some hints? > >>> Thanks! > > >> Elizabeth made the plots in the paper and has some code for it. For > >> starters, familiarize yourself with the 'GenomeGraphs' package -- > >> note > >> that in the GenomeGraphs vignette, it has an exon array example. I > >> have a vignette for this, but haven't finished it yet. > > >> HTH, > >> Mark > > >>> Sabrina > > >>>> Cheers, > >>>> Mark > > >>>>> Ok, I just used the following line to read Affy Annotation > > >>>>> read.csv(file="MoEx-1_0-st- > >>>>> v1.na27.mm9.transcript.csv",head=TRUE,sep=",",skip=20) > > >>>>> then I used > >>>>> match(exFirma[x2$ID,1],affyAnnotation$transcript_cluster_Id) > > >>>>> but I only got NAs > > >>>>> I can not figure out what went wrong because I checked the first > >>>>> ID, > >>>>> it is in Affy annotation file. Can you point out what I did > >>>>> incorrectly? Thanks ! > > >>>>> Sabrina > > >>>>> On Dec 3, 4:53 pm, Mark Robinson <[EMAIL PROTECTED]> wrote: > >>>>>> Sabrina. > > >>>>>> See below. > > >>>>>> On 04/12/2008, at 8:41 AM, sabrina wrote: > > >>>>>>> Hi, Mark: > > >>>>>>> Thank you so much for the detailed explanation! > >>>>>>> Actually I am working on Mouse Exon arrays. I did check the > >>>>>>> UCSC > >>>>>>> genome browser, but they do not have the Exon Array available > >>>>>>> as > >>>>>>> Human Exon. > > >>>>>> Right! Good point. > > >>>>>>> My study is trying to find alternative splicing events between > >>>>>>> two > >>>>>>> groups with identical genotypes but different treatments. each > >>>>>>> group > >>>>>>> has 4 or 5 biological replicates. So I did all as described in > >>>>>>> the > >>>>>>> examples from Human Exon array, then used the following codes to > >>>>>>> find > >>>>>>> the significant genes with potential splicing events: > > >>>>>>> ... > >>>>>>> exFirma<-extractDataFrame(firmaScore,addNames=TRUE); > >>>>>>> exFirma[,6:ncol(exFirma)]<-log2(exFirma[,6:ncol(exFirma)]) > > >>>>>>> library(limma) > > >>>>>>> design <- > >>>>>>> cbind(Grp1=c(1,1,1,1,1,0,0,0,0),Grp2=c(0,0,0,0,0,1,1,1,1)) > > >>>>>>> fit<-lmFit(exFirma[,6:ncol(exFirma)],design) > >>>>>>> fit<-eBayes(fit) > >>>>>>> x<-topTable(fit) > > >>>>>> Careful here as to what you are fitting and testing with limma. > >>>>>> You > >>>>>> are fitting a parameterization that would require a contrast to > >>>>>> be fit > >>>>>> ( ... such as 'contrasts.fit' ... ) in order to assess the > >>>>>> differences > >>>>>> between your 2 groups. Did you get this example from a web page > >>>>>> or a > >>>>>> thread somewhere? If so, I should correct it. > > >>>>>> Probably what you want is: > > >>>>>> design <- cbind(Grp1=1,Grp2=c(0,0,0,0,0,1,1,1,1)) > > ... > > read more » --~--~---------~--~----~------------~-------~--~----~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~----------~----~----~----~------~----~------~--~---