[aroma.affymetrix] Re: Compare the splicing pattern of two samples

Mark Robinson Thu, 30 Apr 2009 00:31:59 -0700

Hi Xinjun.

Comments below.



On 30/04/2009, at 12:33 AM, Xinjun Zhang wrote:

> Hi:
>
> Sorry for the second ambiguous question. Now I will give it out in  
> another way:
>
> After limma analysis:
>
> # two groups: CEU and YRI
> >design
>                  CEU    YRIvsCEU
> GSM188687    1    0
> GSM188688    1    0
> GSM188861    1    1
> GSM188862    1    1
>
> #Limma
> fsDF <- extractDataFrame(fs, addNames=TRUE)
> fsDF[,-c(1:5)] <- log2(fsDF[,-c(1:5)])
> fit<-lmFit(fsDF[,-c(1:5)],design)
> fit<-eBayes(fit)
> fit$genes<-fsDF[,1]
> topTable(fit, coef=NULL, number = 10, adjust="BH")
>
> Then I got output in RGui in this form:
>
> > topTable(fit,coef="YRIvsCEU",adjust="BH")
>             ID      logFC         t      P.Value adj.P.Val         B
> 248067 3851537  10.781228  18.01769 5.069965e-07 0.1441163  
> -4.159301        ## In this line, is "248067 " a NCBI geneid d and  
> "3851537" a probeset id?  And if the first coloum is gene ID, what  
> is strange to me is that  some of the ID is not a human gene id.
> 219150 3721400 -12.364204 -14.91257 1.798048e-06 0.2146088 -4.164325
> 90041  2903401  -8.915503 -13.79270 3.021449e-06 0.2146088 -4.166979
> 80808  2836738   7.811150  13.45085 3.568320e-06 0.2146088 -4.167917
> 250529 3862018  -7.935552 -13.33698 3.774934e-06 0.2146088 -4.168245
> 176674 3462843  10.559478  12.92835 4.637400e-06 0.2158173 -4.169490
> 224640 3744039   7.930627  12.66410 5.314668e-06 0.2158173 -4.170356
> 134937 3224650  -9.385466 -12.18252 6.860763e-06 0.2437758 -4.172073
> 155392 3352948  -6.802731 -11.71350 8.878365e-06 0.2804133 -4.173938
> 104503 3003193  -8.947865 -11.50151 1.000676e-05 0.2844473 -4.174852

Be careful here.  The first number you see here ("248067") is a row  
number that limma puts in, and is not an gene/probeset identifier of  
any kind.  The "ID" column is what you have put in fit$genes (see  
above, you have the command "fit$genes <- fsDF[,1]").

I would actually recommend that you put more in fit$genes, because  
what you have have now is only the first column of the fsDF data  
frame, which gives the transcript_cluster_id.  So, you have the  
transcript_cluster_id, but you don't know what probeset_id this  
corresponds to.

For example:

 > head(fsDF[,1:5])
   unitName groupName unit group cell
1  2315251   2315252    1     1    1
2  2315251   2315253    1     2    2
3  2315373   2315374    2     1    3
4  2315373   2315375    2     2    4
5  2315373   2315376    2     3    5
6  2315373   2315377    2     4    6

Maybe it would be better to do something like:

...
fit$genes <- fsDF[,1:2]

... that way your output table will look something like:

 > topTable(fit,coef=1,n=2)
      unitName groupName     logFC         t      P.Value     
adj.P.Val        B
4166  3581637   3582005 -2.581813 -12.65587 3.786135e-13 2.342177e-09  
19.22332
4459  2656818   2656822 -2.951434 -12.54443 4.684822e-13 2.342177e-09  
19.03846




> >
> > topTable(fit,coef=NULL,adjust.method="BH") # what is the  
> difference of the two forms of output? I mean, what does the column  
> "CEU" and "YRIvsCEU" mean?
>        ProbeID       CEU   YRIvsCEU         F      P.Value adj.P.Val
> 248067 3851537 -5.235567  10.781228 162.45279 1.705464e-06 0.4847866
> 219150 3721400  6.152369 -12.364204 111.19496 6.042656e-06 0.7176463
> 90041  2903401  4.511199  -8.915503  95.13301 1.013043e-05 0.7176463
> 80808  2836738 -3.606537   7.811150  90.99296 1.173308e-05 0.7176463
> 250529 3862018  3.970771  -7.935552  88.93762 1.265099e-05 0.7176463
> 176674 3462843 -5.305275  10.559478  83.57312 1.552603e-05 0.7176463
> 224640 3744039 -3.792070   7.930627  80.34278 1.767260e-05 0.7176463
> 134937 3224650  4.691757  -9.385466  74.20693 2.292795e-05 0.8146732
> 155392 3352948  3.399463  -6.802731  68.60307 2.962962e-05 0.9358188
> 104503 3003193  4.306241  -8.947865  66.23524 3.322071e-05 0.9443152


This is where you should read and understand what limma (and  
specifically the topTable() routine) is doing.  Briefly, when you say  
coef=NULL, you are testing the significance of all parameters  
simultaneously, which results in a moderated F-statistic.  When you  
specify a single column (e.g. coef="YRIvsCEU" as you have), you are  
testing the significance of that single parameter, and you will get a  
moderated t-statistic.



> And when I take a quick look at the output produced by the function  
> write.fit( ) :
> write.fit(fit,results=NULL, "limma_result.out",digits=3,   
> adjust="none",  method="separate")
>
> and the first line of the output file "limma_result.ou" is :
>
> Coef.CEU    Coef.YRIvsCEU    t.CEU    t.YRIvsCEU    p.value.CEU     
> p.value.YRIvsCEU    F    F.p.value    Genes
> -0.5    0.955    -1.08    1.46    0.31653    0.18906    1.07     
> 0.39541    2315251
> 0.41    -0.632    0.88    -0.96    0.4071    0.36826    0.5     
> 0.62453    2315251
> 0.168    -0.384    0.41    -0.66    0.69603    0.53134    0.22     
> 0.80748    2315373
> -0.597    1.1    -1.44    1.88    0.19287    0.10287    1.78     
> 0.23803    2315373
> 0.276    -0.396    0.68    -0.69    0.51883    0.51368    0.27     
> 0.76801    2315373
> 0.118    -0.214    0.25    -0.32    0.80819    0.75689    0.05     
> 0.94922    2315373
> -0.018    0.128    -0.04    0.21    0.96823    0.83876    0.03     
> 0.9667    2315554
> -0.027    0.191    -0.06    0.31    0.95283    0.76783    0.07     
> 0.9317    2315554
> 0.276    -0.629    0.65    -1.05    0.53693    0.33006    0.56     
> 0.59639    2315554
> 0.405    -0.552    0.88    -0.85    0.40726    0.4232    0.44     
> 0.66023    2315554
> 0.109    -0.199    0.23    -0.3    0.82444    0.77509    0.04     
> 0.95666    2315554
> -0.318    0.771    -0.63    1.08    0.54789    0.31597    0.6     
> 0.57392    2315554
> .....
>
> It got only a colomn called "Genes" ( it is probe id, I guess). So  
> how can I use this output to find out the differentially expressed  
> exons( and the corresponding genes) in the two groups? Is it clearer  
> to you? Thanks in advance!
>
> Xinjun


As I mentioned above, you should change what you put in fit$genes to  
allow you to know what probeset_id each row corresponds to.  I'm  
guessing here, but you are probably interested in the "YRIvsCEU"  
parameter, so you'd probably sort on the p.value.YRIvsCEU column ...

Cheers,
Mark




>
>
>
>
>
> On Tue, Apr 28, 2009 at 4:29 PM, Mark Robinson  
> <mrobin...@wehi.edu.au> wrote:
>
> Hi Xinjun.
>
> Comments below.
>
>
> On 28/04/2009, at 12:25 PM, Xinjun Zhang wrote:
>
> > Hi Mark:
> >
> > Thanks very much for your clarification! Now I have approached to
> > limma analysis of FIRMA score to get differentially spliced genes
> > ( and also splicing pattern of each ). But I still have some
> > difficulty to understand the code ( in red ) below in Limma  
> analysis:
> >
> >         #fs is the 'standard' FirmaSet-object
> >         fsDF <- extractDataFrame(fs, addNames=TRUE)
> >         fsDF[,-c(1:5)] <- log2(fsDF[,-c(1:5)])    # I know why log2
> > is here but confused by fsDF[,c(1:5)] .... what does this expression
> > mean?
>
>
> Note that it is -c(1:5), meaning operate on (here, take logs) all of
> the columns except 1:5 ... that is, because extractDataFrame gives
> some extra columns at the beginning that are NOT data, we only want to
> log that columns that have actual data.
>
>
> >         design <- cbind(Grp1=1,Grp2=c(rep(0,n_1),rep(1,n_2)))
> >         fit<-lmFit(fsDF[,-c(1:5)],design)
> >         fit<-eBayes(fit)
> >         fit$genes<-fsDF[,1] # Can I also get seperate splicing
> > patterns for the  two differentially spliced genes from two group
> > (control and treatment )?
>
>
> I'm not sure what you are asking here.  The probesets where the "Grp2"
> coefficient is significantly different from 0 may highlight
> differentially spliced exons.  Does that help?
>
> Mark
>
>
>
>
>
> > Thanks in advance!
> >
> > Xinjun
> >
> > On Mon, Apr 27, 2009 at 6:19 AM, Mark Robinson
> > <mrobin...@wehi.edu.au> wrote:
> >
> >
> > Hi Xinjun.
> >
> > Quick comments below.
> >
> >
> > > Hi Mark:
> > >
> > > Thanks very much for your help and I have have got a quick start
> > on a
> > > small
> > > dataset that each group (control and treatment ) contains 4
> > arrays. I have
> > > set up a file structure like this:
> > > =================================================
> > > rawDate/
> > >      controlGroup/
> > >               HuEx-1_0-st-v1/
> > >                       GSMXXXXXX.CEL
> > >                       GSMXXXXXX.CEL
> > >                       ............................
> > >
> > >      treatmentGroup/
> > >               HuEx-1_0-st-v1/
> > >                       GSMXXXXXX.CEL
> > >                       GSMXXXXXX.CEL
> > >                       ............................
> > > ==================================================*
> >
> >
> > This setup will need to be changed.  You will want to put ALL  
> samples
> > together to do the PLM fitting, normalization, FIRMA scoring, etc.
> >
> > Something like:
> >
> > rawData/
> >     thisExperiment/
> >              HuEx-1_0-st-v1/
> >                      sample1.CEL
> >                      sample2.CEL
> >                      ...
> >                      sampleN.CEL
> >
> >
> > >  This is my code ( my questions are in red):*
> > >
> > > library(aroma.affymetrix)
> > >
> > > #Getting annotation data files
> > > chipType <- "HuEx-1_0-st-v1"
> > > cdf <- AffymetrixCdfFile$byChipType(chipType)
> > > print(cdf)
> > >
> > > #Defining CEL set
> > > cs <- AffymetrixCelSet$byName("controlGroup", cdf=cdf)
> > > print(cs)
> > >
> > > #Background Adjustment and Normalization
> > > bc <- RmaBackgroundCorrection(cs)
> > > csBC <- process(bc,verbose=verbose)
> > >
> > > #quantile normalization
> > > qn <- QuantileNormalization(csBC, typesToUpdate="pm")  ### I set  
> the
> > > second
> > > parameter as "pm" as the chip type is Affymetrix exon array, is  
> that
> > > right?
> > > print(qn)
> > > csN <- process(qn, verbose=verbose)
> >
> > This is fine.
> >
> > >
> > > #Summarization
> > > getCdf(csN)
> > > ## * Fit exon-by-exon*, change the value of mergeGroups to FALSE
> > in the
> > > ExonRmaPlm() call above.
> > > *plmEx *<- ExonRmaPlm(csN, mergeGroups=*FALSE*)
> > > print(*plmEx*)
> > > #To fit the PLM to all of the data, do:
> > > fit(*plmEx*, verbose=verbose)
> > > *
> > > And here is my problem:*
> > > firma <- FirmaModel(plmTr)  # I have noticd that FIRMA analysis
> > ONLY works
> > > from the PLM based on transcripts. So when the parameter is  
> plmTr, I
> > > wonder
> > > how can it detect the splicing events of genes ? Should not the
> > parameter
> > > be
> > > plmEx?
> > > fit(firma, verbose=verbose)
> > > fs <- getFirmaScores(firma)
> >
> > Like it says on the group web page for Exon arrays: "The FIRMA
> > analysis
> > ONLY works from the PLM based on transcripts".  This is NOT an  
> error.
> > That's the way it works.
> >
> > The manuscript gives more details for why this is the case:
> > http://bioinformatics.oxfordjournals.org/cgi/content/abstract/24/15/1707
> >
> > Hope that helps.
> >
> > Cheers,
> > Mark
> >
> >
> >
> >
> >
> >
> >
> > >
> > >
> > > On Fri, Apr 24, 2009 at 5:24 PM, Mark Robinson
> > > <mrobin...@wehi.edu.au>wrote:
> > >
> > >>
> > >> Hi Xinjun.
> > >>
> > >> Here is a quick sketch of what I might do.
> > >>
> > >> 1. Run everything to get FIRMA scores.  See group page for  
> running
> > >> details and the Purdom Bioinformatics 2008 paper for  
> methodological
> > >> details.
> > >>
> > >> 2a. If Nn or Nc > 1, use 'limma' to look for a difference in  
> FIRMA
> > >> scores between your two groups.  See threads:
> > >>
> > >> http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/36d8c59d742fc503/
> > >>
> > >> http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/7d2645bd76cc2023/
> > >>
> > >> 2b. If you have say patient samples (and a good number of them),
> > you
> > >> might expect only a subset of your C or N patients to have a
> > splicing
> > >> aberration.  In this case, maybe you just want to look for large-
> > in-
> > >> magnitude FIRMA scores.
> > >>
> > >> ... maybe you also want to look at interesting probesets via
> > >> GenomeGraphs:
> > >>
> > >> http://groups.google.com/group/aroma-affymetrix/web/using-the-genomegraphs-package-with-firma
> > >>
> > >> Cheers,
> > >> Mark
> > >>
> > >>
> > >> On 24/04/2009, at 1:01 AM, liszt wrote:
> > >>
> > >> >
> > >> > Hi:
> > >> >
> > >> > Now I have  N CEL files from both normal and cancer tissue. The
> > two
> > >> > groups contains Nn and Nc CEL files separately (Nn = Nc). I
> > want to
> > >> > investigate the difference in gene's splicing pattern  between
> > normal
> > >> > tissue and cancerous tissue. Would you give me some tips? ( I
> > have
> > >> > read the document and got no answers) Thanks!
> > >> >
> > >> > Xinjun
> > >> > >
> > >>
> > >> ------------------------------
> > >> Mark Robinson
> > >> Epigenetics Laboratory, Garvan
> > >> Bioinformatics Division, WEHI
> > >> e: m.robin...@garvan.org.au
> > >> e: mrobin...@wehi.edu.au
> > >> p: +61 (0)3 9345 2628
> > >> f: +61 (0)3 9347 0852
> > >> ------------------------------
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> >
> > >>
> > >
> > > >
> > >
> >
> >
> >
> >
> >
> >
> > >
> >
>
> ------------------------------
> Mark Robinson
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: m.robin...@garvan.org.au
> e: mrobin...@wehi.edu.au
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
> ------------------------------
>
>
>
>
>
>
>
>
> >
>

------------------------------
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------





--~--~---------~--~----~------------~-------~--~----~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~----------~----~----~----~------~----~------~--~---

[aroma.affymetrix] Re: Compare the splicing pattern of two samples

Reply via email to