Hi Xinjun. Comments below.
On 30/04/2009, at 12:33 AM, Xinjun Zhang wrote: > Hi: > > Sorry for the second ambiguous question. Now I will give it out in > another way: > > After limma analysis: > > # two groups: CEU and YRI > >design > CEU YRIvsCEU > GSM188687 1 0 > GSM188688 1 0 > GSM188861 1 1 > GSM188862 1 1 > > #Limma > fsDF <- extractDataFrame(fs, addNames=TRUE) > fsDF[,-c(1:5)] <- log2(fsDF[,-c(1:5)]) > fit<-lmFit(fsDF[,-c(1:5)],design) > fit<-eBayes(fit) > fit$genes<-fsDF[,1] > topTable(fit, coef=NULL, number = 10, adjust="BH") > > Then I got output in RGui in this form: > > > topTable(fit,coef="YRIvsCEU",adjust="BH") > ID logFC t P.Value adj.P.Val B > 248067 3851537 10.781228 18.01769 5.069965e-07 0.1441163 > -4.159301 ## In this line, is "248067 " a NCBI geneid d and > "3851537" a probeset id? And if the first coloum is gene ID, what > is strange to me is that some of the ID is not a human gene id. > 219150 3721400 -12.364204 -14.91257 1.798048e-06 0.2146088 -4.164325 > 90041 2903401 -8.915503 -13.79270 3.021449e-06 0.2146088 -4.166979 > 80808 2836738 7.811150 13.45085 3.568320e-06 0.2146088 -4.167917 > 250529 3862018 -7.935552 -13.33698 3.774934e-06 0.2146088 -4.168245 > 176674 3462843 10.559478 12.92835 4.637400e-06 0.2158173 -4.169490 > 224640 3744039 7.930627 12.66410 5.314668e-06 0.2158173 -4.170356 > 134937 3224650 -9.385466 -12.18252 6.860763e-06 0.2437758 -4.172073 > 155392 3352948 -6.802731 -11.71350 8.878365e-06 0.2804133 -4.173938 > 104503 3003193 -8.947865 -11.50151 1.000676e-05 0.2844473 -4.174852 Be careful here. The first number you see here ("248067") is a row number that limma puts in, and is not an gene/probeset identifier of any kind. The "ID" column is what you have put in fit$genes (see above, you have the command "fit$genes <- fsDF[,1]"). I would actually recommend that you put more in fit$genes, because what you have have now is only the first column of the fsDF data frame, which gives the transcript_cluster_id. So, you have the transcript_cluster_id, but you don't know what probeset_id this corresponds to. For example: > head(fsDF[,1:5]) unitName groupName unit group cell 1 2315251 2315252 1 1 1 2 2315251 2315253 1 2 2 3 2315373 2315374 2 1 3 4 2315373 2315375 2 2 4 5 2315373 2315376 2 3 5 6 2315373 2315377 2 4 6 Maybe it would be better to do something like: ... fit$genes <- fsDF[,1:2] ... that way your output table will look something like: > topTable(fit,coef=1,n=2) unitName groupName logFC t P.Value adj.P.Val B 4166 3581637 3582005 -2.581813 -12.65587 3.786135e-13 2.342177e-09 19.22332 4459 2656818 2656822 -2.951434 -12.54443 4.684822e-13 2.342177e-09 19.03846 > > > > topTable(fit,coef=NULL,adjust.method="BH") # what is the > difference of the two forms of output? I mean, what does the column > "CEU" and "YRIvsCEU" mean? > ProbeID CEU YRIvsCEU F P.Value adj.P.Val > 248067 3851537 -5.235567 10.781228 162.45279 1.705464e-06 0.4847866 > 219150 3721400 6.152369 -12.364204 111.19496 6.042656e-06 0.7176463 > 90041 2903401 4.511199 -8.915503 95.13301 1.013043e-05 0.7176463 > 80808 2836738 -3.606537 7.811150 90.99296 1.173308e-05 0.7176463 > 250529 3862018 3.970771 -7.935552 88.93762 1.265099e-05 0.7176463 > 176674 3462843 -5.305275 10.559478 83.57312 1.552603e-05 0.7176463 > 224640 3744039 -3.792070 7.930627 80.34278 1.767260e-05 0.7176463 > 134937 3224650 4.691757 -9.385466 74.20693 2.292795e-05 0.8146732 > 155392 3352948 3.399463 -6.802731 68.60307 2.962962e-05 0.9358188 > 104503 3003193 4.306241 -8.947865 66.23524 3.322071e-05 0.9443152 This is where you should read and understand what limma (and specifically the topTable() routine) is doing. Briefly, when you say coef=NULL, you are testing the significance of all parameters simultaneously, which results in a moderated F-statistic. When you specify a single column (e.g. coef="YRIvsCEU" as you have), you are testing the significance of that single parameter, and you will get a moderated t-statistic. > And when I take a quick look at the output produced by the function > write.fit( ) : > write.fit(fit,results=NULL, "limma_result.out",digits=3, > adjust="none", method="separate") > > and the first line of the output file "limma_result.ou" is : > > Coef.CEU Coef.YRIvsCEU t.CEU t.YRIvsCEU p.value.CEU > p.value.YRIvsCEU F F.p.value Genes > -0.5 0.955 -1.08 1.46 0.31653 0.18906 1.07 > 0.39541 2315251 > 0.41 -0.632 0.88 -0.96 0.4071 0.36826 0.5 > 0.62453 2315251 > 0.168 -0.384 0.41 -0.66 0.69603 0.53134 0.22 > 0.80748 2315373 > -0.597 1.1 -1.44 1.88 0.19287 0.10287 1.78 > 0.23803 2315373 > 0.276 -0.396 0.68 -0.69 0.51883 0.51368 0.27 > 0.76801 2315373 > 0.118 -0.214 0.25 -0.32 0.80819 0.75689 0.05 > 0.94922 2315373 > -0.018 0.128 -0.04 0.21 0.96823 0.83876 0.03 > 0.9667 2315554 > -0.027 0.191 -0.06 0.31 0.95283 0.76783 0.07 > 0.9317 2315554 > 0.276 -0.629 0.65 -1.05 0.53693 0.33006 0.56 > 0.59639 2315554 > 0.405 -0.552 0.88 -0.85 0.40726 0.4232 0.44 > 0.66023 2315554 > 0.109 -0.199 0.23 -0.3 0.82444 0.77509 0.04 > 0.95666 2315554 > -0.318 0.771 -0.63 1.08 0.54789 0.31597 0.6 > 0.57392 2315554 > ..... > > It got only a colomn called "Genes" ( it is probe id, I guess). So > how can I use this output to find out the differentially expressed > exons( and the corresponding genes) in the two groups? Is it clearer > to you? Thanks in advance! > > Xinjun As I mentioned above, you should change what you put in fit$genes to allow you to know what probeset_id each row corresponds to. I'm guessing here, but you are probably interested in the "YRIvsCEU" parameter, so you'd probably sort on the p.value.YRIvsCEU column ... Cheers, Mark > > > > > > On Tue, Apr 28, 2009 at 4:29 PM, Mark Robinson > <mrobin...@wehi.edu.au> wrote: > > Hi Xinjun. > > Comments below. > > > On 28/04/2009, at 12:25 PM, Xinjun Zhang wrote: > > > Hi Mark: > > > > Thanks very much for your clarification! Now I have approached to > > limma analysis of FIRMA score to get differentially spliced genes > > ( and also splicing pattern of each ). But I still have some > > difficulty to understand the code ( in red ) below in Limma > analysis: > > > > #fs is the 'standard' FirmaSet-object > > fsDF <- extractDataFrame(fs, addNames=TRUE) > > fsDF[,-c(1:5)] <- log2(fsDF[,-c(1:5)]) # I know why log2 > > is here but confused by fsDF[,c(1:5)] .... what does this expression > > mean? > > > Note that it is -c(1:5), meaning operate on (here, take logs) all of > the columns except 1:5 ... that is, because extractDataFrame gives > some extra columns at the beginning that are NOT data, we only want to > log that columns that have actual data. > > > > design <- cbind(Grp1=1,Grp2=c(rep(0,n_1),rep(1,n_2))) > > fit<-lmFit(fsDF[,-c(1:5)],design) > > fit<-eBayes(fit) > > fit$genes<-fsDF[,1] # Can I also get seperate splicing > > patterns for the two differentially spliced genes from two group > > (control and treatment )? > > > I'm not sure what you are asking here. The probesets where the "Grp2" > coefficient is significantly different from 0 may highlight > differentially spliced exons. Does that help? > > Mark > > > > > > > Thanks in advance! > > > > Xinjun > > > > On Mon, Apr 27, 2009 at 6:19 AM, Mark Robinson > > <mrobin...@wehi.edu.au> wrote: > > > > > > Hi Xinjun. > > > > Quick comments below. > > > > > > > Hi Mark: > > > > > > Thanks very much for your help and I have have got a quick start > > on a > > > small > > > dataset that each group (control and treatment ) contains 4 > > arrays. I have > > > set up a file structure like this: > > > ================================================= > > > rawDate/ > > > controlGroup/ > > > HuEx-1_0-st-v1/ > > > GSMXXXXXX.CEL > > > GSMXXXXXX.CEL > > > ............................ > > > > > > treatmentGroup/ > > > HuEx-1_0-st-v1/ > > > GSMXXXXXX.CEL > > > GSMXXXXXX.CEL > > > ............................ > > > ==================================================* > > > > > > This setup will need to be changed. You will want to put ALL > samples > > together to do the PLM fitting, normalization, FIRMA scoring, etc. > > > > Something like: > > > > rawData/ > > thisExperiment/ > > HuEx-1_0-st-v1/ > > sample1.CEL > > sample2.CEL > > ... > > sampleN.CEL > > > > > > > This is my code ( my questions are in red):* > > > > > > library(aroma.affymetrix) > > > > > > #Getting annotation data files > > > chipType <- "HuEx-1_0-st-v1" > > > cdf <- AffymetrixCdfFile$byChipType(chipType) > > > print(cdf) > > > > > > #Defining CEL set > > > cs <- AffymetrixCelSet$byName("controlGroup", cdf=cdf) > > > print(cs) > > > > > > #Background Adjustment and Normalization > > > bc <- RmaBackgroundCorrection(cs) > > > csBC <- process(bc,verbose=verbose) > > > > > > #quantile normalization > > > qn <- QuantileNormalization(csBC, typesToUpdate="pm") ### I set > the > > > second > > > parameter as "pm" as the chip type is Affymetrix exon array, is > that > > > right? > > > print(qn) > > > csN <- process(qn, verbose=verbose) > > > > This is fine. > > > > > > > > #Summarization > > > getCdf(csN) > > > ## * Fit exon-by-exon*, change the value of mergeGroups to FALSE > > in the > > > ExonRmaPlm() call above. > > > *plmEx *<- ExonRmaPlm(csN, mergeGroups=*FALSE*) > > > print(*plmEx*) > > > #To fit the PLM to all of the data, do: > > > fit(*plmEx*, verbose=verbose) > > > * > > > And here is my problem:* > > > firma <- FirmaModel(plmTr) # I have noticd that FIRMA analysis > > ONLY works > > > from the PLM based on transcripts. So when the parameter is > plmTr, I > > > wonder > > > how can it detect the splicing events of genes ? Should not the > > parameter > > > be > > > plmEx? > > > fit(firma, verbose=verbose) > > > fs <- getFirmaScores(firma) > > > > Like it says on the group web page for Exon arrays: "The FIRMA > > analysis > > ONLY works from the PLM based on transcripts". This is NOT an > error. > > That's the way it works. > > > > The manuscript gives more details for why this is the case: > > http://bioinformatics.oxfordjournals.org/cgi/content/abstract/24/15/1707 > > > > Hope that helps. > > > > Cheers, > > Mark > > > > > > > > > > > > > > > > > > > > > > > On Fri, Apr 24, 2009 at 5:24 PM, Mark Robinson > > > <mrobin...@wehi.edu.au>wrote: > > > > > >> > > >> Hi Xinjun. > > >> > > >> Here is a quick sketch of what I might do. > > >> > > >> 1. Run everything to get FIRMA scores. See group page for > running > > >> details and the Purdom Bioinformatics 2008 paper for > methodological > > >> details. > > >> > > >> 2a. If Nn or Nc > 1, use 'limma' to look for a difference in > FIRMA > > >> scores between your two groups. See threads: > > >> > > >> http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/36d8c59d742fc503/ > > >> > > >> http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/7d2645bd76cc2023/ > > >> > > >> 2b. If you have say patient samples (and a good number of them), > > you > > >> might expect only a subset of your C or N patients to have a > > splicing > > >> aberration. In this case, maybe you just want to look for large- > > in- > > >> magnitude FIRMA scores. > > >> > > >> ... maybe you also want to look at interesting probesets via > > >> GenomeGraphs: > > >> > > >> http://groups.google.com/group/aroma-affymetrix/web/using-the-genomegraphs-package-with-firma > > >> > > >> Cheers, > > >> Mark > > >> > > >> > > >> On 24/04/2009, at 1:01 AM, liszt wrote: > > >> > > >> > > > >> > Hi: > > >> > > > >> > Now I have N CEL files from both normal and cancer tissue. The > > two > > >> > groups contains Nn and Nc CEL files separately (Nn = Nc). I > > want to > > >> > investigate the difference in gene's splicing pattern between > > normal > > >> > tissue and cancerous tissue. Would you give me some tips? ( I > > have > > >> > read the document and got no answers) Thanks! > > >> > > > >> > Xinjun > > >> > > > > >> > > >> ------------------------------ > > >> Mark Robinson > > >> Epigenetics Laboratory, Garvan > > >> Bioinformatics Division, WEHI > > >> e: m.robin...@garvan.org.au > > >> e: mrobin...@wehi.edu.au > > >> p: +61 (0)3 9345 2628 > > >> f: +61 (0)3 9347 0852 > > >> ------------------------------ > > >> > > >> > > >> > > >> > > >> > > >> > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------ > Mark Robinson > Epigenetics Laboratory, Garvan > Bioinformatics Division, WEHI > e: m.robin...@garvan.org.au > e: mrobin...@wehi.edu.au > p: +61 (0)3 9345 2628 > f: +61 (0)3 9347 0852 > ------------------------------ > > > > > > > > > > > ------------------------------ Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 ------------------------------ --~--~---------~--~----~------------~-------~--~----~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~----------~----~----~----~------~----~------~--~---