[aroma.affymetrix] MoGene ST 1.0 CDF file

2011-08-25 Thread sabrina
Hello:
I used PdInfo2Cdf to create my own CDF for Mouse Gene ST 1.0  .
However, when I compared with r3 version available from
aroma.affymetrix, I see a small difference:

from r3 version

AffymetrixCdfFile:
Path: annotationData/chipTypes/MoGene-1_0-st-v1
Filename: MoGene-1_0-st-v1,r3.cdf
Filesize: 17.63MB
Chip type: MoGene-1_0-st-v1,r3
RAM: 0.00MB
File format: v4 (binary; XDA)
Dimension: 1050x1050
Number of cells: 1102500
Number of units: 35512
Cells per unit: 31.05
Number of QC units: 1


from my own oversion


AffymetrixCdfFile:
Path: annotationData/chipTypes/MoGene-1_0-st-v1
Filename: MoGene-1_0-st-v1,pd.cdf
Filesize: 17.78MB
Chip type: MoGene-1_0-st-v1,pd
RAM: 0.00MB
File format: v4 (binary; XDA)
Dimension: 1050x1050
Number of cells: 1102500
Number of units: 35556
Cells per unit: 31.01
Number of QC units: 0

..
so the number of  units is different, so is the number of QC units

Can someone tell me what the number of QC units is and how did mine
miss that?

Thanks

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


[aroma.affymetrix] exon and psr

2011-04-20 Thread sabrina
Hello:
I used FIRMA to detect AS events. I have noticed that in some cases,
one exon were represented by multiple psrs, and only one of them was
detected as AS event. in this case, I would suspect that this
particular AS event is not real. However, in terms of coding, is there
any easy way to map psrs to exon and detect AS based on exon instead
of psrs? The problem arises when affy has multiple psrs mapped to a
region where several transcripts have one or multiple exons in that
region.

Any suggestions appreciated!

Sabrina

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


[aroma.affymetrix] Firma score help

2011-01-04 Thread sabrina
Hello, all:
I have 16 exon arrays from 4 groups, A1, A2 and B1, B2. As are
genetically identical but with different treatment,Bs are genetically
identical (but different from As) with different treatment 1 and 2. I
am interested in finding alternative splicing events that was affected
by treatment on A and B and also by genetics (under same treatement).
Therefore, the comparison I am interested are A1 vs. A2, B1vsB2, A1 vs
B1 (only treatment 1) . My question is, when I do RMA and calculate
the FIRMA score, do I use all 16 array I have to get Firma scores,
then use limma as suggested before to apply to design matrix and
contrast matrix? Or should I for each comparison , do RMA and FIRMA
score for exon arrays only involved in that comparison? Thanks for
your input!

Sabrina

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/


[aroma.affymetrix] using LmFit to FIRMA score

2010-01-27 Thread sabrina
Hi, all
A quick question about using lmFit on firma scores from exon array
analysis. According to the vignettes, to detect splicing events, we
need to use lmFit to find the  significant event as following (my code
so far):

fit <- lmFit(exFirma[,3:ncol(exFirma)] ,design)
fit2<-contrasts.fit(fit,cont.matrix);
fit2<-eBayes(fit2,proportion=p);
topList<-topTable(fit2,coef=contrast[k],adj="fdr");


What this does, to my understanding, is to find firma scores that are
significantly different between your contrast groups, say one group is
all negative, the other is all positive. But we know exon splicing
event is not usually 100%, so it is very likely some samples in one
group have -4, -3, and the others in the same group might be 1 or 0.5.
but if that is the case, since lmFit basically uses the moderated t-
test, will it detect this as significant? If what I just described is
reasonable, then will I be better off if I just use the m=median(firma
score) for each group and rank the the differences of my contrast
group (m1-m2) and find out the most potential splicing events? I guess
I am a little bit confused. Thanks


Sabrina

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


[aroma.affymetrix] Re: FIRMA SCORE from different test

2010-01-11 Thread sabrina
Hi, Mark:
Thanks for the reply. I actually ended up with the #1 as you
suggested. The reason I wanted to do this comparison is that I know by
adding more data sets, the firma score will be different since it
takes into consideration of each sample and each probe of exon of
interest.

 What is interesting is that, after I used #1 dataset and computed
firma score, I did another experiment, one is using all firma scores,
made design matrix and contrast matrix with all samples, trying to
find the top list of comparison A. The other one is to only use firma
scores that only involve the comparison A, made design matrix and
contrast matrix with only samples involved in comparison A, and got
result. As you can imagine, the p values, t values for each exon was
quite different. As I digged more into it, I realized that the degree
of freedom of residuals for these two tests were quite different, the
first one is much bigger than the second dof because it was based on
all samples. So I wonder if by including all samples, I accidentally
increased significance of exon splicing event?

Thanks!

Sabrina

On Jan 11, 12:08 am, Mark Robinson  wrote:
> Hi Sabrina.
>
> Some comments below.
>
> On 24-Dec-09, at 2:39 AM, sabrina wrote:
>
> > Hi, everyone:
> > I am trying to use aroma to detect splicing events. My dataset
> > consists 3 groups (genetically different) with one as control, the
> > other two as mutants. each group also had control and treatment
> > subgroups. My interest is to compare mutant with control, under normal
> > condition,and compare control under two condition (normal and
> > treatment), and interactions among mutant and control with condition.
>
> > I did two different runs for the second comparison (control group
> > under two conditions): 1. using all groups , all conditions to do
> > normalization, plmFit, firmaScore and use limFit with specific
> > contrast matrix to find the splicing events. 2. only use control
> > group, under two conditions, do normalization, plmFit, firmaScore, and
> > limFit to find the splicing events.
>
> I'm not sure whether such a comparison will lead you to anything  
> meaningful.  It seems like the best approach "should" be to use all  
> data together, since that should allow the best estimates of probe  
> effects to be estimated.  Your #1 and #2 don't seem comparable to me.
>
> > I compared the results from these two runs, they were quite different
> > and B values from the topTable were very different.  I wonder what is
> > the right choice to fit my objective. Thanks
>
> If you did this with gene expression data in limma, you'd probably  
> find the same thing -- process the data in different ways and you'll  
> get different B values.
>
> Cheers,
> Mark
>
>
>
> > Sabrina
>
> > --
> > When reporting problems on aroma.affymetrix, make sure 1) to run the  
> > latest version of the package, 2) to report the output of  
> > sessionInfo() and traceback(), and 3) to post a complete code example.
>
> > You received this message because you are subscribed to the Google  
> > Groups "aroma.affymetrix" group.
> > To post to this group, send email to aroma-affymetrix@googlegroups.com
> > To unsubscribe from this group, send email to 
> > aroma-affymetrix-unsubscr...@googlegroups.com
> > For more options, visit this group 
> > athttp://groups.google.com/group/aroma-affymetrix?hl=en
>
> --
> Mark Robinson, PhD (Melb)
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: m.robin...@garvan.org.au
> e: mrobin...@wehi.edu.au
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
> --
>
> __
> The information in this email is confidential and intended solely for the 
> addressee.
> You must not disclose, forward, print or use it without the permission of the 
> sender.
> __
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en

[aroma.affymetrix] Re: firma score comparison

2010-01-11 Thread sabrina
Hi, Mark:
Thanks! So I guess using adjusted p value to cut off the significant
events will be a better choice than using B values?

Sabrina

On Jan 11, 12:19 am, Mark Robinson  wrote:
> Hi Sabrina.
>
> This is really a limma question, not an aroma.affymetrix one, but I'll  
> take a quick stab.  Basically, you'll notice that regardless of the  
> value you set for 'p', you'll get the same t-statistics and p-values.  
> It only changes the B values (posterior log odds).
>
> Cheers,
> Mark
>
> On 9-Jan-10, at 1:14 AM, sabrina wrote:
>
>
>
> > Hi, all:
> > A quick question. I know after we extract frima score, I should use
> > eBayes to do the following:
>
> > assume we have design matrix and contrast matrix,
> > fit<-lmFit(exFirma,design)
> > fit2<-contrasts.fit(fit,cont.matrix);
> > fit2<-eBayes(fit2,proportion=p);
>
> > the default setting for p is 0.01 that is for gene expression. Is 0.01
> > also a good guess for this firma score purpose? Or should I use 0.001
> > or some other values? Thanks!
>
> > Sabrina
> > --
> > When reporting problems on aroma.affymetrix, make sure 1) to run the  
> > latest version of the package, 2) to report the output of  
> > sessionInfo() and traceback(), and 3) to post a complete code example.
>
> > You received this message because you are subscribed to the Google  
> > Groups "aroma.affymetrix" group.
> > To post to this group, send email to aroma-affymetrix@googlegroups.com
> > To unsubscribe from this group, send email to 
> > aroma-affymetrix-unsubscr...@googlegroups.com
> > For more options, visit this group 
> > athttp://groups.google.com/group/aroma-affymetrix?hl=en
>
> --
> Mark Robinson, PhD (Melb)
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: m.robin...@garvan.org.au
> e: mrobin...@wehi.edu.au
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
> --
>
> __
> The information in this email is confidential and intended solely for the 
> addressee.
> You must not disclose, forward, print or use it without the permission of the 
> sender.
> __
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en

[aroma.affymetrix] firma score comparison

2010-01-08 Thread sabrina
Hi, all:
A quick question. I know after we extract frima score, I should use
eBayes to do the following:

assume we have design matrix and contrast matrix,
fit<-lmFit(exFirma,design)
fit2<-contrasts.fit(fit,cont.matrix);
fit2<-eBayes(fit2,proportion=p);

the default setting for p is 0.01 that is for gene expression. Is 0.01
also a good guess for this firma score purpose? Or should I use 0.001
or some other values? Thanks!


Sabrina
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en

[aroma.affymetrix] FIRMA SCORE from different test

2009-12-23 Thread sabrina
Hi, everyone:
I am trying to use aroma to detect splicing events. My dataset
consists 3 groups (genetically different) with one as control, the
other two as mutants. each group also had control and treatment
subgroups. My interest is to compare mutant with control, under normal
condition,and compare control under two condition (normal and
treatment), and interactions among mutant and control with condition.

I did two different runs for the second comparison (control group
under two conditions): 1. using all groups , all conditions to do
normalization, plmFit, firmaScore and use limFit with specific
contrast matrix to find the splicing events. 2. only use control
group, under two conditions, do normalization, plmFit, firmaScore, and
limFit to find the splicing events.

I compared the results from these two runs, they were quite different
and B values from the topTable were very different.  I wonder what is
the right choice to fit my objective. Thanks


Sabrina

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


[aroma.affymetrix] Re: EXON question

2009-11-18 Thread sabrina
Hi, Mark:

On Nov 17, 11:46 pm, Mark Robinson  wrote:
> Hi Sabrina.
>
> Comments below.
>
> > Hi, Mark:
> > Thanks for the information. What I worried about using coordinate is
> > that coordinate changes with assembly, while sequences do not change.
> > I don't know how many psrs are out of exon boundary, I will look into
> > it. But here is an example:
> > group Number: 5092691:
> > CTTATCGAGATAGTGCTTCTGTGGGTCAATCTAGATATTGATAGATTTGGACTGGAGAAG
>
> > when I used blat from ensembl, it showed that it is out of exon
> > boundary.
>
> I guess I mean: do you have an example of a (core) probe that maps to  
> a location not within an bona fide exon?  What's above is certainly  
> not an Affymetrix 25-mer probe, so I'm a bit confused.


This is the sequence of the psr or from the unitGroup. The file I used
was downloaded from Affy website. According to the website:
Probe set sequences consist of the contiguous genomic sequence
starting at the beginning of the first probe and ending at the end of
the last probe in the set as they are aligned to the genome. They are
provided in the orientation they exist in the mRNA in 5'->3'
direction.

so my hypothesis is that if all of the probes in psr are in the exon
region, then this psr sequence should be in it as well, right?


>
> > Forgot to ask another question. Back to my original question, is there
> > an (easy) way to map from  partial sequence (i.e. the probeset
> > sequence) to exon sequence in a batch mode or in R? Thanks!
>
> This is not really an aroma.affymetrix question and there are various  
> answers to this on the Bioconductor mailing list.  Personally, I  
> usually use the findOverlaps() function in IRanges package.  Its super  
> quick.  Here is an rough sketch of an example (please don't take this  
> and assume it work work for you ... this is just to direct you toward  
> a way to do it):


I will look into it! Thanks a lot :)


Sabrina
>
> library(IRanges)
>
> # assume you have 3 corresponding vectors for the exons:
> # 'exonStart', 'exonEnd', 'exonChr'
> exonIranges <- mapply(IRanges, start = split(exonStart, exonChr),
>                               end = split(exonEnd, exonChr))
> exonL <- do.call(RangesList, exonIranges)
>
> # similarly, corresponding vectors for the 25-mer Affy probes
> # 'sp' is position, 'ch' is chromosome
> probeRanges <- mapply(IRanges, start = split(sp, ch),
>                               end = split(sp+24, ch))
> probeL <- do.call(RangesList, probeRanges)
>
> # you may wish to make sure that these lists cover the same chromosomes
> fo <- findOverlaps(probeL, exonL)
>
> I'll leave it as an exercise to the reader to unwrap the contents of  
> the 'fo' object [Hint: see the as.table() method].  In your case,  
> you'd be interested to know how many probes do not "overlap" within  
> exon boundaries and I'd guess that you'll be careful what set of  
> probes you use (e.g. core only?).
>
> Hope that helps.
>
> Cheers,
> Mark
>
> > Of course, if I made a mistake when I compiled the CDF, then that is
> > different story. I hope not.
>
> > Any suggestions? Thanks!
>
> > On Nov 16, 8:10 pm, Mark Robinson  wrote:
> >> Hi folks.
>
> >> Note that you can download these directly with the R/Bioconductor
> >> package 'rtracklayer'.  For example:
>
> >> library(rtracklayer)
> >> session <- browserSession("UCSC")
> >> q1 <- ucscTableQuery(session, "refGene", GenomicRanges(genome =  
> >> "hg18"))
> >> refGene <- getTable(q1)
>
> >> Sabrina:  I'm actually surprised that many probes lie outside exon
> >> boundaries.  They were specifically designed to be inside.  Of  
> >> course,
> >> the array was designed on annotation from a few years ago, but  
> >> still I
> >> would expect this to be minimal.  Can you give some numbers on this?
> >> Or, some examples.
>
> >> Cheers,
> >> Mark
>
> >> On 17-Nov-09, at 3:41 AM, camelbbs wrote:
>
> >>>http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/
> >>> refFlat.txt.gz  and refGene.txt.gz are the similar.
>
> >>> On Nov 15, 5:02 pm, sabrina  wrote:
> >>>> Hi, Jiang:
> >>>> Can you give me more detail about UCSC RefGene file? Is there a  
> >>>> link
> >>>> to download? Thanks!
>
> >>>> Sabrina
>
> >>>> On Nov 15, 11:38 am, camelbbs  wrote:
>
>

[aroma.affymetrix] Re: EXON question

2009-11-17 Thread sabrina
Hi, Mark:
Forgot to ask another question. Back to my original question, is there
an (easy) way to map from  partial sequence (i.e. the probeset
sequence) to exon sequence in a batch mode or in R? Thanks!

Sabrina

On Nov 16, 8:10 pm, Mark Robinson  wrote:
> Hi folks.
>
> Note that you can download these directly with the R/Bioconductor  
> package 'rtracklayer'.  For example:
>
> library(rtracklayer)
> session <- browserSession("UCSC")
> q1 <- ucscTableQuery(session, "refGene", GenomicRanges(genome = "hg18"))
> refGene <- getTable(q1)
>
> Sabrina:  I'm actually surprised that many probes lie outside exon  
> boundaries.  They were specifically designed to be inside.  Of course,  
> the array was designed on annotation from a few years ago, but still I  
> would expect this to be minimal.  Can you give some numbers on this?  
> Or, some examples.
>
> Cheers,
> Mark
>
> On 17-Nov-09, at 3:41 AM, camelbbs wrote:
>
>
>
> >http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/
> > refFlat.txt.gz  and refGene.txt.gz are the similar.
>
> > On Nov 15, 5:02 pm, sabrina  wrote:
> >> Hi, Jiang:
> >> Can you give me more detail about UCSC RefGene file? Is there a link
> >> to download? Thanks!
>
> >> Sabrina
>
> >> On Nov 15, 11:38 am, camelbbs  wrote:
>
> >>> hi,
> >>> I think you can get the probeset coordinates from affy exon array
> >>> annotation files, and you can get exon coordinates from the UCSC
> >>> RefGene files. Then you can compare them.
> >>> Best,
> >>> Jiang
>
> >>> On Nov 15, 6:56 am, sabrina  wrote:
>
> >>>> Hello, all:
> >>>> I used FIRMA to find potential spliced genes. But as we know, the
> >>>> probesets from affy exon array could be out of exon boundary or  
> >>>> just
> >>>> cover part of exons. I wonder if I have the sequence of the  
> >>>> probeset
> >>>> (which I get from Affy website), how do I do it in batch to find
> >>>> whether it is in an exon (ENSEMBL) region , or if it is, how do I  
> >>>> get
> >>>> the entire exon sequence and coordinates? Thanks a lot!
>
> >>>> Sabrina
>
> > --
> > When reporting problems on aroma.affymetrix, make sure 1) to run the  
> > latest version of the package, 2) to report the output of  
> > sessionInfo() and traceback(), and 3) to post a complete code example.
>
> > You received this message because you are subscribed to the Google  
> > Groups "aroma.affymetrix" group.
> > To post to this group, send email to aroma-affymetrix@googlegroups.com
> > To unsubscribe from this group, send email to 
> > aroma-affymetrix-unsubscr...@googlegroups.com
> > For more options, visit this group 
> > athttp://groups.google.com/group/aroma-affymetrix?hl=en
>
> --
> Mark Robinson, PhD (Melb)
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: m.robin...@garvan.org.au
> e: mrobin...@wehi.edu.au
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
> --

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


[aroma.affymetrix] Re: EXON question

2009-11-17 Thread sabrina
Hi, Mark:
Thanks for the information. What I worried about using coordinate is
that coordinate changes with assembly, while sequences do not change.
I don't know how many psrs are out of exon boundary, I will look into
it. But here is an example:
group Number: 5092691:
CTTATCGAGATAGTGCTTCTGTGGGTCAATCTAGATATTGATAGATTTGGACTGGAGAAG

when I used blat from ensembl, it showed that it is out of exon
boundary.

Of course, if I made a mistake when I compiled the CDF, then that is
different story. I hope not.

Any suggestions? Thanks!

On Nov 16, 8:10 pm, Mark Robinson  wrote:
> Hi folks.
>
> Note that you can download these directly with the R/Bioconductor  
> package 'rtracklayer'.  For example:
>
> library(rtracklayer)
> session <- browserSession("UCSC")
> q1 <- ucscTableQuery(session, "refGene", GenomicRanges(genome = "hg18"))
> refGene <- getTable(q1)
>
> Sabrina:  I'm actually surprised that many probes lie outside exon  
> boundaries.  They were specifically designed to be inside.  Of course,  
> the array was designed on annotation from a few years ago, but still I  
> would expect this to be minimal.  Can you give some numbers on this?  
> Or, some examples.
>
> Cheers,
> Mark
>
> On 17-Nov-09, at 3:41 AM, camelbbs wrote:
>
>
>
> >http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/
> > refFlat.txt.gz  and refGene.txt.gz are the similar.
>
> > On Nov 15, 5:02 pm, sabrina  wrote:
> >> Hi, Jiang:
> >> Can you give me more detail about UCSC RefGene file? Is there a link
> >> to download? Thanks!
>
> >> Sabrina
>
> >> On Nov 15, 11:38 am, camelbbs  wrote:
>
> >>> hi,
> >>> I think you can get the probeset coordinates from affy exon array
> >>> annotation files, and you can get exon coordinates from the UCSC
> >>> RefGene files. Then you can compare them.
> >>> Best,
> >>> Jiang
>
> >>> On Nov 15, 6:56 am, sabrina  wrote:
>
> >>>> Hello, all:
> >>>> I used FIRMA to find potential spliced genes. But as we know, the
> >>>> probesets from affy exon array could be out of exon boundary or  
> >>>> just
> >>>> cover part of exons. I wonder if I have the sequence of the  
> >>>> probeset
> >>>> (which I get from Affy website), how do I do it in batch to find
> >>>> whether it is in an exon (ENSEMBL) region , or if it is, how do I  
> >>>> get
> >>>> the entire exon sequence and coordinates? Thanks a lot!
>
> >>>> Sabrina
>
> > --
> > When reporting problems on aroma.affymetrix, make sure 1) to run the  
> > latest version of the package, 2) to report the output of  
> > sessionInfo() and traceback(), and 3) to post a complete code example.
>
> > You received this message because you are subscribed to the Google  
> > Groups "aroma.affymetrix" group.
> > To post to this group, send email to aroma-affymetrix@googlegroups.com
> > To unsubscribe from this group, send email to 
> > aroma-affymetrix-unsubscr...@googlegroups.com
> > For more options, visit this group 
> > athttp://groups.google.com/group/aroma-affymetrix?hl=en
>
> --
> Mark Robinson, PhD (Melb)
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: m.robin...@garvan.org.au
> e: mrobin...@wehi.edu.au
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
> --

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


[aroma.affymetrix] Re: EXON question

2009-11-15 Thread sabrina
Hi, Jiang:
Can you give me more detail about UCSC RefGene file? Is there a link
to download? Thanks!

Sabrina

On Nov 15, 11:38 am, camelbbs  wrote:
> hi,
> I think you can get the probeset coordinates from affy exon array
> annotation files, and you can get exon coordinates from the UCSC
> RefGene files. Then you can compare them.
> Best,
> Jiang
>
> On Nov 15, 6:56 am, sabrina  wrote:
>
> > Hello, all:
> > I used FIRMA to find potential spliced genes. But as we know, the
> > probesets from affy exon array could be out of exon boundary or just
> > cover part of exons. I wonder if I have the sequence of the probeset
> > (which I get from Affy website), how do I do it in batch to find
> > whether it is in an exon (ENSEMBL) region , or if it is, how do I get
> > the entire exon sequence and coordinates? Thanks a lot!
>
> > Sabrina

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


[aroma.affymetrix] EXON question

2009-11-15 Thread sabrina
Hello, all:
I used FIRMA to find potential spliced genes. But as we know, the
probesets from affy exon array could be out of exon boundary or just
cover part of exons. I wonder if I have the sequence of the probeset
(which I get from Affy website), how do I do it in batch to find
whether it is in an exon (ENSEMBL) region , or if it is, how do I get
the entire exon sequence and coordinates? Thanks a lot!

Sabrina

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en


[aroma.affymetrix] Re: exon array probe level value

2009-09-18 Thread sabrina

Hi, Mark:


On Sep 6, 3:58 am, Mark Robinson  wrote:
> Hi Sabrina.
>
> Please keep responses on-list in case others have a similar question.

I think I just accidentally hit the reply to author, sorry about that!

I solved the CDF problem because of  one typo :)
>
> > The other question i have is that even if I use the existing CDF file,
> > some of the gene expression or probe level expressions are really low,
> > like 1.1. It seemed to me t hat they are just too low. These were
> > obtained after the background normalization etc.
> > I used Rmabackgroundcorrection
>
> I need a little more convincing before I get too worried about these  
> things.  Is there any reason to suspect there is something wrong  
> here?  What does the raw (i.e. before BG adjustment) data look like  
> for these units that have low expression?  What does the data for this  
> unit look like after BG adjustment but before normalization?  What  
> does the data for this unit look like before PLM fitting but after  
> normalization?   Anything unusual in the commands you have run?  Like  
> taking logs twice or BG adjusting data that has already been  
> preprocessed, etc.?


I am still struggling about this issue. What I am trying to figure out
is how to filter out background noise. I know that in the AROMA
process, I do background correction, then normalization, then firma
model at probeset level. After I finished normalization, (ExonRmaPlm),
I took a look at the gene level and probe level distribution, here is
an example:

summary(pbsetSumm) #this is for each probe on the array
#output:
 Min.   : 0.8715   Min.   : 0.9028   Min.   : 0.9475   Min.   :
0.9499
 1st Qu.: 4.6625   1st Qu.: 4.5951   1st Qu.: 4.6517   1st Qu.:
4.6143
 Median : 6.3976   Median : 6.3863   Median : 6.3850   Median :
6.3803
 Mean   : 6.3778   Mean   : 6.3743   Mean   : 6.3652   Mean   :
6.3766
 3rd Qu.: 7.9876   3rd Qu.: 8.0125   3rd Qu.: 7.9833   3rd Qu.:
8.0051
 Max.   :14.4451   Max.   :14.3972   Max.   :14.4558   Max.   :
14.4009

In the original FIRMA paper, I noticed it filtered out all probes has
expression level <3 (log2 scale) and in the simulation study, you used
the two different values of Muc (7 and 10) to mimic two different
scenarios, one close to background, one far above background.

If I filter out these expression level <3 (only for the first array),
I will filter out 16760 probes , which is about 8% of probes, but then
the question is: if I want to detect exon skipping, the transcript
with skipped exons should have background level expression for that
exon , am I correct? If I am correct, then if I filter out these so-
called background noise probes, will I miss these exon skipping
events? i am a little bit confused and any suggestions are welcome!
Thanks!

Sabrina

> Just some thoughts.
>
> Cheers,
> Mark
>
> On 4-Sep-09, at 12:49 AM, sabrina wrote:
>
>
>
> > Hi, Mark:
> > How do you test if the one I created is similar to the existing core
> > CDF? I don't think I include the extended or FUll probes, because I
> > only included the probes with notation of core from Affy's annotation
> > file
> > Here is the display when I used the existing core CDF
>
> > Path: annotationData/chipTypes/MoEx-1_0-st-v1
> > Filename: MoEx-1_0-st-v1,coreR1,A20080718,MR.cdf
> > Filesize: 30.53MB
> > Chip type: MoEx-1_0-st-v1,coreR1,A20080718,MR
> > RAM: 0.00MB
> > File format: v4 (binary; XDA)
> > Dimension: 2560x2560
> > Number of cells: 6553600
> > Number of units: 17831
> > Cells per unit: 367.54
> > Number of QC units: 1
>
> > when I check one of the units
> > $`6838637`
> > $`6838637`$type
> > [1] 1
>
> > $`6838637`$direction
> > [1] 1
>
> > from the existing CDF:
> > $`6838637`$groups$`4736172`
> > $`6838637`$groups$`4736172`$x
> > [1] 2273 1148  816 1391
>
> > $`6838637`$groups$`4736172`$y
> > [1]  832 1172 2502  540
>
> > $`6838637`$groups$`4736172`$pbase
> > [1] "C" "T" "A" "G"
>
> > $`6838637`$groups$`4736172`$tbase
> > [1] "G" "A" "T" "C"
>
> > $`6838637`$groups$`4736172`$expos
> > [1] 0 1 2 3
>
> > $`6838637`$groups$`4736172`$direction
> > [1] 1
>
> > ---
> > Here is the one I created myself
> > Path: annotationData/chipTypes/MoEx-1_0-st-v1
> > Filename: MoEx-1_0-st-v1,core,2009-09-02,no1Probe,no1Exon,SS.cdf
> > Filesize: 28.14MB
> > Chip type: MoEx-1_0-st-v1,core,2009-09-02,no1Probe,no1Exon,SS
> > RAM: 0.00MB
> > File format: v4 (binary; XDA)
> > Dimension: 2560x2560
> > Nu

[aroma.affymetrix] Re: smoothScatter plot

2009-09-09 Thread sabrina

Hi, Henrik:
It worked! Thanks

Sabrina



> library(aroma.affymetrix)
Loading required package: aroma.core
Loading required package: R.cache
R.cache v0.1.7 (2008-02-27) successfully loaded. See ?R.cache for
help.
Error: package 'R.filesets' 0.5.2 was found, but >= 0.5.3 is required
by 'aroma.core'


On Sep 5, 9:21 pm, Henrik Bengtsson 
wrote:
> Hi,
>
> this occurs with geneplotter v1.21.5 or newer.  This is because we
> called geneplotter::smoothScatter(), but that function was moved from
> geneplotter v1.21.4 to the graphics package of R v2.9.0.  When you say
> it used to work, this was probably when you had R v2.8.x.
>
> I've provided a fix that remains backward compatibility.  This will be
> available for everyone in the next major release.  For now, you can
> try:
>
> source("http://www.braju.com/R/hbLite.R";)
>
> # Unix-like systems
> installPackages("http://www.braju.com/R/repos/aroma.core_1.1.6.tar.gz";);
> installPackages("http://www.braju.com/R/repos/aroma.affymetrix_1.1.4.tar.gz";);
>
> # Windows systems
> installPackages("http://www.braju.com/R/repos/aroma.core_1.1.6.zip";);
> installPackages("http://www.braju.com/R/repos/aroma.affymetrix_1.1.4.zip";);
>
> and restart R.
>
> Does it work?
>
> /Henrik
>
> On Fri, Sep 4, 2009 at 7:29 AM, sabrina wrote:
>
> > Hi, all:
> > I was trying the example of testing quality of cel files by using
> > smoothScatter.
> > I installed geneplotter but when I ran the code, it gave me the
> > following error:
>
> > Error: 'smoothScatter' is not an exported object from
> > 'namespace:geneplotter'
>
> > Can anyone help me about this error?
>
> > I ran the program before, it worked fine . I am using R2.9.1. Thanks
>
> > Sabrina
--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] smoothScatter plot

2009-09-04 Thread sabrina

Hi, all:
I was trying the example of testing quality of cel files by using
smoothScatter.
I installed geneplotter but when I ran the code, it gave me the
following error:

Error: 'smoothScatter' is not an exported object from
'namespace:geneplotter'

Can anyone help me about this error?

I ran the program before, it worked fine . I am using R2.9.1. Thanks

Sabrina
--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: exon array probe level value

2009-09-03 Thread sabrina

Hi, Mark:
The other question i have is that even if I use the existing CDF file,
some of the gene expression or probe level expressions are really low,
like 1.1. It seemed to me t hat they are just too low. These were
obtained after the background normalization etc. Is there any way that
I can tell whether they are just background noise? I used
Rmabackgroundcorrection

Thanks

Sabrina

On Sep 2, 6:17 pm, Mark Robinson  wrote:
> Hi Sabrina.
>
> Hmm, that does seem unexpected.  I would expect that if there are just  
> a few probes picked out, it shouldn't change that much.
>
> Can you give more details about this new CDF you made?  Would it be  
> similar to the existing 'core' CDF?  Presumably, there will be a large  
> overlap with the existing CDF.  Can you verify this?  And, give some  
> numbers on this.  A few things like: total number of probes for each  
> CDF, for a given unit does it have a large overlap of probes, etc.
>
> The one thing that comes to mind is that maybe you've included a bunch  
> of the 'full' or 'extended' probes in your new CDF and this would  
> likely lower the average.
>
> Cheers,
> Mark
>
> On 3-Sep-09, at 5:07 AM, sabrina shao wrote:
>
>
>
>
>
> > Hi, all
> > I am working on mouse Exon arrays from Affy. Because there are probes
> > with multiple hits on the genome, I screened out these probes and
> > exons with only one probe and transcript clusters with one exon on the
> > array. After that I created a CDF file as illustrated from this group.
> > My problems is: when I checked the probe level expressions (not gene
> > expression) , the average is about 4.5 , and the gene expression level
> > ( not surprisingly) is about similar level. And the smallest probe
> > level was about 1.1.  I also used the CDF file I downloaded from aroma
> > here, using the same program, I got the gene expression level around
> > 6.2. That is about 1.7 fold change. and the results from the new CDF
> > that I created was just too low. I can't figure it out why. Can anyone
> > give me some suggestions and hints? Thanks
>
> > Sabrina
>
> --
> Mark Robinson, PhD (Melb)
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: m.robin...@garvan.org.au
> e: mrobin...@wehi.edu.au
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
> --
--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] exon array probe level value

2009-09-02 Thread sabrina shao

Hi, all
I am working on mouse Exon arrays from Affy. Because there are probes
with multiple hits on the genome, I screened out these probes and
exons with only one probe and transcript clusters with one exon on the
array. After that I created a CDF file as illustrated from this group.
My problems is: when I checked the probe level expressions (not gene
expression) , the average is about 4.5 , and the gene expression level
( not surprisingly) is about similar level. And the smallest probe
level was about 1.1.  I also used the CDF file I downloaded from aroma
here, using the same program, I got the gene expression level around
6.2. That is about 1.7 fold change. and the results from the new CDF
that I created was just too low. I can't figure it out why. Can anyone
give me some suggestions and hints? Thanks

Sabrina

--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: SNPs affecting EXon splicing detection

2009-06-22 Thread sabrina

Hi, Mark
I tried to run flat2Cdf, it ran towards end but told me that the
memory  was not enough, I am running windows xp with 4G ram. But it
did read file, split file into units etc. So I tried on Linux server,
but it gave me read.table error (same code I used on my PC)
>>>>>>
Reading TXT file ...Error in read.table(file, header = TRUE,
colClasses = col.class, stringsAsFactors = FALSE,  :
unused argument(s) (stringsAsFactors ...)
Execution halted
>>>>>>>
Can you suggest possible solutions? Thanks

Sabrina

On Jun 11, 9:41 am, Mark Robinson  wrote:
> Hi Sabrina.
>
> The Unit_ID can be any "transcript cluster" identifier of your  
> choice.  The easiest may be to use the Affymetrix transcript cluster  
> identifier itself ... available from:
>
> http://www.affymetrix.com/analysis/downloads/current_exon/MoEx-1_0-st...
>
> See the 'transcript_cluster_id' column.  Perhaps only take the "core"  
> probes, as defined in the the 'level' column?
>
> Note: we used Ensembl in that flat2Cdf() example since we were using a  
> custom organization (i.e. non-Affy) of the probesets.
>
> Cheers,
> Mark
>
> On 11/06/2009, at 10:58 PM, sabrina wrote:
>
>
>
>
>
> > Hi, Mark:
> > for the Unit_id, does it have to be Ensembl gene ID like ENSMUSG?
> > Lots of genes do not have ensembl assignment from Affy annotation
> > file. There are lots of missing annotaions, and I still have not found
> > any good way to deal with it. Do you have any suggestions?
>
> > Thanks
>
> > Sabrina
>
> > On Jun 10, 12:32 am, Mark Robinson  wrote:
> >> Hi Sabrina.
>
> >> How about you try and create a 'flat' file like the one described  
> >> at:http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-file
> >> ...
>
> >> Presumably, you will be comfortable with the Exon Array's 'probetab'
> >> file by now and possibly the Affymetrix annotation CSV file and so  
> >> you
> >> should have access to all this information.
>
> >> For example, from the following table:
>
> >> mac1618:HuEx-1_0-st-v2.probe.tab mrobinson$ head HuEx-1_0-st-
> >> v2.probe.tab
> >> Probe ID        Probe Set ID    probe x probe y assembly        
> >> seqname start   stop
> >> strand  probe sequence  target strandedness     category
> >> 494998  2315101 917     193     build-34/hg16   chr1    1788    
> >> 1812    +
> >> CACGGGAAGTCTGGGCTAAGAGACA       Sense   main
> >> 1734213 2315101 1092    677     build-34/hg16   chr1    1973    
> >> 1997    +
> >> ACACCAGAAGATGAACAATGG       Sense   main
> >> 4767517 2315101 796     1862    build-34/hg16   chr1    1992    
> >> 2016    +
> >> ATTAAGTTACATGCAGACAACAGGG       Sense   main
> >> 4286427 2315101 986     1674    build-34/hg16   chr1    2006    
> >> 2030    +
> >> TGCCTGGTTGTGGTATTAAGTTACA       Sense   main
> >> 5760145 2315102 144     2250    build-34/hg16   chr1    2520    
> >> 2544    +
> >> TCGGCCGTCGTCTTCTGCAGCTCTG       Sense   main
> >> 671410  2315102 689     262     build-34/hg16   chr1    2523    
> >> 2547    +
> >> AAGTCGGCCGTCGTCTTCTGCAGCT       Sense   main
> >> 4275780 2315102 579     1670    build-34/hg16   chr1    2526    
> >> 2550    +
> >> TCCAAGTCGGCCGTCGTCTTCTGCA       Sense   main
> >> 4293462 2315102 341     1677    build-34/hg16   chr1    2531    
> >> 2555    +
> >> TGTGATCCAAGTCGGCCGTCGTCTT       Sense   main
> >> 5388    2315103 267     2       build-34/hg16   chr1    2927    
> >> 2951    +
> >> CTGTCTGTCGACCCAGCTGGAGGCA       Sense   main
> >> [snip]
>
> >> ... you see the second column is the probeset_id, which would be used
> >> as the "Group_ID" column for your flat file.  Depending on whether  
> >> you
> >> are using the Ensembl CDF or the Affymetrix annotation, you would  
> >> need
> >> to create a mapping to get the transcript cluster id column (here,  
> >> the
> >> "Unit_ID").  Everything else you need (Probe_Sequence, X, Y,  
> >> Probe_ID)
> >> is within the table above.
>
> >> Then, it would be just a matter of filtering OUT those probes that
> >> overlap a SNP, which based on your mapping exercise, you must have a
> >> list of.  Then, make a call to the flat2Cdf() script and hopefully
> >> you'll be off and running.
>
> >> Let me know how you go.
>
> >> Cheers,
> >&

[aroma.affymetrix] Re: SNPs affecting EXon splicing detection

2009-06-11 Thread sabrina

Hi, Mark:
for the Unit_id, does it have to be Ensembl gene ID like ENSMUSG?
Lots of genes do not have ensembl assignment from Affy annotation
file. There are lots of missing annotaions, and I still have not found
any good way to deal with it. Do you have any suggestions?

Thanks

Sabrina

On Jun 10, 12:32 am, Mark Robinson  wrote:
> Hi Sabrina.
>
> How about you try and create a 'flat' file like the one described 
> at:http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-file...
>
> Presumably, you will be comfortable with the Exon Array's 'probetab'  
> file by now and possibly the Affymetrix annotation CSV file and so you  
> should have access to all this information.
>
> For example, from the following table:
>
> mac1618:HuEx-1_0-st-v2.probe.tab mrobinson$ head HuEx-1_0-st-
> v2.probe.tab
> Probe ID        Probe Set ID    probe x probe y assembly        seqname start 
>   stop    
> strand  probe sequence  target strandedness     category
> 494998  2315101 917     193     build-34/hg16   chr1    1788    1812    +     
>    
> CACGGGAAGTCTGGGCTAAGAGACA       Sense   main
> 1734213 2315101 1092    677     build-34/hg16   chr1    1973    1997    +     
>    
> ACACCAGAAGATGAACAATGG       Sense   main
> 4767517 2315101 796     1862    build-34/hg16   chr1    1992    2016    +     
>    
> ATTAAGTTACATGCAGACAACAGGG       Sense   main
> 4286427 2315101 986     1674    build-34/hg16   chr1    2006    2030    +     
>    
> TGCCTGGTTGTGGTATTAAGTTACA       Sense   main
> 5760145 2315102 144     2250    build-34/hg16   chr1    2520    2544    +     
>    
> TCGGCCGTCGTCTTCTGCAGCTCTG       Sense   main
> 671410  2315102 689     262     build-34/hg16   chr1    2523    2547    +     
>    
> AAGTCGGCCGTCGTCTTCTGCAGCT       Sense   main
> 4275780 2315102 579     1670    build-34/hg16   chr1    2526    2550    +     
>    
> TCCAAGTCGGCCGTCGTCTTCTGCA       Sense   main
> 4293462 2315102 341     1677    build-34/hg16   chr1    2531    2555    +     
>    
> TGTGATCCAAGTCGGCCGTCGTCTT       Sense   main
> 5388    2315103 267     2       build-34/hg16   chr1    2927    2951    +     
>    
> CTGTCTGTCGACCCAGCTGGAGGCA       Sense   main
> [snip]
>
> ... you see the second column is the probeset_id, which would be used  
> as the "Group_ID" column for your flat file.  Depending on whether you  
> are using the Ensembl CDF or the Affymetrix annotation, you would need  
> to create a mapping to get the transcript cluster id column (here, the  
> "Unit_ID").  Everything else you need (Probe_Sequence, X, Y, Probe_ID)  
> is within the table above.
>
> Then, it would be just a matter of filtering OUT those probes that  
> overlap a SNP, which based on your mapping exercise, you must have a  
> list of.  Then, make a call to the flat2Cdf() script and hopefully  
> you'll be off and running.
>
> Let me know how you go.
>
> Cheers,
> Mark
>
> On 10/06/2009, at 1:00 PM, sabrina wrote:
>
>
>
>
>
> > Thanks , Mark!
> > Can you show me /walk me through how to get a new snp-free CDF ? I
> > finally got the right version of snp and probe mapping so I am ready
> > to try it out!
>
> > Sabrina
>
> > On Jun 6, 3:14 am, Mark Robinson  wrote:
> >> Hi Sabrina.
>
> >> Comments below.
>
> >> On 06/06/2009, at 1:57 AM, sabrina wrote:
>
> >>> Hi, Mark:
> >>> I finally found the SNP data set that is suitable for my case. As I
> >>> understand, aroma used RMA to estimate gene level and exon level
> >>> intensities. After I estimate gene level (transcript level), I can  
> >>> use
> >>> FIRMA to estimate residual for each exon and compose a score as
> >>> described in the paper . My question is: if there is a SNP  
> >>> difference
> >>> between two strains within one exon, should I exclude that exon from
> >>> estimating transcript level value? My guess is probably no.
>
> >> If the SNP affects only 1 probe in an entire transcript, I would
> >> expect it to have very little impact on the gene-level summary.  And,
> >> especially so if there are a large number of total probes for that
> >> gene.  It may have a noticeable effect on the probe effect.
>
> >>> So will it
> >>> be a good idea if I exclude that exon after I calculate all FIRMA
> >>> scores or  should I exclude these exons after I estimate residuals ,
> >>> but only used these residuals not affected by SNPs for firma score
> >>> estimation? Thanks
>
> >> Keep in mind the residuals are calculated at the pro

[aroma.affymetrix] Re: SNPs affecting EXon splicing detection

2009-06-09 Thread sabrina

Thanks , Mark!
Can you show me /walk me through how to get a new snp-free CDF ? I
finally got the right version of snp and probe mapping so I am ready
to try it out!

Sabrina

On Jun 6, 3:14 am, Mark Robinson  wrote:
> Hi Sabrina.
>
> Comments below.
>
> On 06/06/2009, at 1:57 AM, sabrina wrote:
>
>
>
> > Hi, Mark:
> > I finally found the SNP data set that is suitable for my case. As I
> > understand, aroma used RMA to estimate gene level and exon level
> > intensities. After I estimate gene level (transcript level), I can use
> > FIRMA to estimate residual for each exon and compose a score as
> > described in the paper . My question is: if there is a SNP difference
> > between two strains within one exon, should I exclude that exon from
> > estimating transcript level value? My guess is probably no.
>
> If the SNP affects only 1 probe in an entire transcript, I would  
> expect it to have very little impact on the gene-level summary.  And,  
> especially so if there are a large number of total probes for that  
> gene.  It may have a noticeable effect on the probe effect.
>
> > So will it
> > be a good idea if I exclude that exon after I calculate all FIRMA
> > scores or  should I exclude these exons after I estimate residuals ,
> > but only used these residuals not affected by SNPs for firma score
> > estimation? Thanks
>
> Keep in mind the residuals are calculated at the probe-level, not the  
> probeset-level.  The FIRMA score is then a summary of the all the  
> residuals for a probeset.
>
> I think you have (at least) 3 choices:
>
> 1. (preferred, i would think) you could remove all affected *probes*  
> (via the creation of a SNP-affected-probe-free CDF) in advance, then  
> run FIRMA as normal.  I can help with this if you tell me which probes  
> are affected.
>
> 2. remove the affected *probesets* afterwards, since you may not  
> believe the FIRMA scores for which these are based on.
>
> 3. as you suggested, only calculate FIRMA scores from unaffected  
> residuals.  But, the information you require to do this is the same  
> information required to do #1 and it would seems like #1 is preferred.
>
> The good thing about option #1 is you would still have some ability to  
> detect differential splicing for the probeset (instead of tossing it  
> away), albeit with the smaller number of remaining unaffected probes.
>
> Cheers,
> Mark
>
>
>
> > Sabrina
>
> > On Apr 30, 3:46 am, Mark Robinson  wrote:
> >> Hi Sabrina.
>
> >> I have not had to deal with this myself, but I do know that it exists
> >> and I can at least suggest a possible route to exclude affected  
> >> exons.
>
> >> Presumably, there is a database (dbSNP?) that tells you the genome
> >> locations of each SNP for your strains.  There is also a probe.tab
> >> file from Affymetrix that gives you the mapped genome locations of
> >> each probe (or you could take the sequences from the same file and  
> >> map
> >> them yourself with a tool like BLAT).  It is then just a matter of
> >> looking whether each probe maps to a location on the genome that
> >> overlaps a SNP.  There is probably a Bioconductor tool for this or  
> >> you
> >> could create a hash, etc.
>
> >> There are a couple levels at which you might introduce this to your
> >> analysis.  You could remove individual probes that are affected.  On
> >> the aroma.affymetrix side, this would require creating a new CDF with
> >> those affected probes not included (a bit tricky but doable).  Or,  
> >> you
> >> could simply post-process your existing results and remove probesets
> >> that have an affected probe (easier but not as elegant).
>
> >> You might've also seen:
>
> >> Duan S, Zhang W, Bleibel WK, Cox NJ, Dolan ME: SNPinProbe 1.0: A
> >> database for filtering out
> >> probes in the Affymetrix GeneChip(R) HumanExon1.0 ST array
> >> potentially affected bySNPs.
> >> Bioinformation 2008, 2(10):469{470.
>
> >> Hope that gets you started.
>
> >> Cheers,
> >> Mark
>
> >> On 30/04/2009, at 6:07 AM, sabrina wrote:
>
> >>> Hi, all:
> >>> I am using Aroma for detectingexonskipping events around two groups
> >>> (two different strains). I found out that several of my top hits
> >>> indeed includes at least one SNP between two strains. I wonder if
> >>> anyone has some suggestion about how to deal with this situation.  
> >>> If I
> >>> need to remove all affected exons from analys

[aroma.affymetrix] Re: SNPs affecting EXon splicing detection

2009-06-05 Thread sabrina

Hi, Mark:
I finally found the SNP data set that is suitable for my case. As I
understand, aroma used RMA to estimate gene level and exon level
intensities. After I estimate gene level (transcript level), I can use
FIRMA to estimate residual for each exon and compose a score as
described in the paper . My question is: if there is a SNP difference
between two strains within one exon, should I exclude that exon from
estimating transcript level value? My guess is probably no. So will it
be a good idea if I exclude that exon after I calculate all FIRMA
scores or  should I exclude these exons after I estimate residuals ,
but only used these residuals not affected by SNPs for firma score
estimation? Thanks

Sabrina

On Apr 30, 3:46 am, Mark Robinson  wrote:
> Hi Sabrina.
>
> I have not had to deal with this myself, but I do know that it exists  
> and I can at least suggest a possible route to exclude affected exons.
>
> Presumably, there is a database (dbSNP?) that tells you the genome  
> locations of each SNP for your strains.  There is also a probe.tab  
> file from Affymetrix that gives you the mapped genome locations of  
> each probe (or you could take the sequences from the same file and map  
> them yourself with a tool like BLAT).  It is then just a matter of  
> looking whether each probe maps to a location on the genome that  
> overlaps a SNP.  There is probably a Bioconductor tool for this or you  
> could create a hash, etc.
>
> There are a couple levels at which you might introduce this to your  
> analysis.  You could remove individual probes that are affected.  On  
> the aroma.affymetrix side, this would require creating a new CDF with  
> those affected probes not included (a bit tricky but doable).  Or, you  
> could simply post-process your existing results and remove probesets  
> that have an affected probe (easier but not as elegant).
>
> You might've also seen:
>
> Duan S, Zhang W, Bleibel WK, Cox NJ, Dolan ME: SNPinProbe 1.0: A  
> database for filtering out
> probes in the Affymetrix GeneChip(R) HumanExon1.0 ST array  
> potentially affected bySNPs.
> Bioinformation 2008, 2(10):469{470.
>
> Hope that gets you started.
>
> Cheers,
> Mark
>
> On 30/04/2009, at 6:07 AM, sabrina wrote:
>
>
>
> > Hi, all:
> > I am using Aroma for detectingexonskipping events around two groups
> > (two different strains). I found out that several of my top hits
> > indeed includes at least one SNP between two strains. I wonder if
> > anyone has some suggestion about how to deal with this situation. If I
> > need to remove all affected exons from analysis, how can I do it? I
> > never worked with SNP data before, can anyone give me a hint? Thanks a
> > lot!
>
> > Sabrina
>
> --
> Mark Robinson
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: m.robin...@garvan.org.au
> e: mrobin...@wehi.edu.au
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
> --
--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] SNPs affecting EXon splicing detection

2009-04-29 Thread sabrina

Hi, all:
I am using Aroma for detecting exon skipping events around two groups
(two different strains). I found out that several of my top hits
indeed includes at least one SNP between two strains. I wonder if
anyone has some suggestion about how to deal with this situation. If I
need to remove all affected exons from analysis, how can I do it? I
never worked with SNP data before, can anyone give me a hint? Thanks a
lot!

Sabrina
--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] view exon effect from intensity plot and FIRMA model

2009-04-12 Thread sabrina

Hi, all:
I am trying to understand the supplementary text of the original FIRMA
paper published in Bioinformatics last year. the S1 figure shows UNR
gene about its exon effect removed or retained. I am confused about
the plots. what are the y-axis? Are they  residuals from RMA fit ,
after running ExonRMAPlm?

Let d be the normalized intensity (log2), r as the residual from plm
fit (log2), p as the probe affinity.   I would assume S1(a) is just
simple plot of d-p, S1(b) as d-p+estimated e(j)? Am I correct? If that
is the case, S1(a) should show exact same information as plot of d
because for all arrays, the probe affinity should be the same? But
from the figures, how can I tell where the splicing events occur? I am
puzzled because I thought it should also remove gene effect (chip
effect)? Can some one explain it to me?

Thanks

Sabrina
--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: Error: cannot allocate vector of size 692.8 Mb

2009-02-04 Thread sabrina

Hi, Mark:
Quick question. I was looking at the example of creating cdfs from
scratch. I downloaded biomaRt , but not sure where to get or how to
create the exonBoundary file. Can you give me more information on that?
Thanks!

Sabrina

On Jan 30, 1:13 am, Mark Robinson  wrote:
> Hi Sabrina.
>
> Comments below.
>
> On 30/01/2009, at 7:38 AM, sabrina wrote:
>
>
>
> > Hi, Mark:
> > Thanks for the suggestions.
> > The reason I was using FULL annotation is from what I read of that
> > FIRMA paper. It mentioned that " if this is detection of novel exons,
> > then all probesets might be used", so I was just giving it a try to
> > see what happens. :)
> > I also noticed that FirmaModel also has 3 different summary methods,
> > "median" as default, "upperquartile" and "max". Due to curiosity, I
> > used the other two methods, however I only got NAs for all
> > trascript_clusters. Notice that in the paper, the authors suggested to
> > use lower quartile or minimum, I wonder if there is any coding error.
> > Of course, I have no way to check the code, just wondering...
>
> Its no error, it was intentional.  You will notice (see  
> getFitUnitGroupFunction.FirmaModel) that in addition to  
> 'summaryMethod', there is also an 'operateOn' argument.  When  
> operateOn="residuals" (default), you currently only have the choices  
> of 'mean' or 'median' for 'summaryMethod', which is why you get NAs.  
> We could add minimum and lowest absolute, but the paper states "...  
> Ultimately, we determined that the median of the residuals in an exon  
> gave the best tradeoff between sensitivity to the size and sign of the  
> residuals and robustness to the small number of probes ...".  When  
> operateOn="weights", then you have summaryMethod must be in {median,  
> upperQuartile, max}.  Again, other possibilities could be made  
> available.
>
>
>
> > On Jan 28, 10:12 pm, Mark Robinson  wrote:
> >> Hi Sabrina.
>
> >> First of all, you are somewhat in unchartered waters here.  I
> >> personally don't recommend using the 'full' CDF for FIRMA analysis.
> >> Others can disagree (and I would encourage some discussion about
> >> this ...), but my reasoning is that the majority of the probes in the
> >> full CDF are querying poorly annotated or predicted exonic regions of
> >> the genome.  From what I've seen (on Human Exon 1.0 data), the probe
> >> intensities for these are mostly at background and could have an
> >> unsavoury effect on the PLM modelling ... and passing that on to
> >> FIRMA.  By restricting to core/Ensembl probes, you lessen the effect
> >> of non-responding probes.
>
> >>  From thinking about it (although I haven't actually done on my
> >> datasets), I would suggest a two-stage analysis:
>
> >> 1. PLM/FIRMA analysis on a set of probes in well-annotated regions.
>
> > Do you mean to use the coreR1 annotation? I am not quite sure that I
> > got it.  Do you mean specific chromosome? If that is the case, how can
> > I pass that to FirmaModel? I don't see any parameter that I can use :(
>
> Sorry, I'm mostly thinking out loud here.  I don't have a vignette  
> written up for this workflow, just ideas in my head.  Perhaps  
> Elizabeth has some comment about what to do with the novel exons.
>
> So, yes, by 'well-annotated', I mean to use the 'coreR1' annotation in  
> step 1.  Or least a CDF that includes probes up to the level of  
> annotation that you are comfortable with.  Bit of work required, but  
> you'll notice that we've created Ensembl-based annotation for the  
> Human Exon arrays.  Thats just an alternative (more expansive)  
> definition of the 'core' gene models than what Affy gives.
>
> >> 2. Differential expression analysis on the remaining probes --
> >> strongly differentially expressed ones may be indicative of
> >> transcripts/variants that are not covered in the well-annotated set.
>
> > Again, I am not sure what to define the remaining probes. Can you give
> > me a hint?
> > So what I understood is that I could do exon -AS using core
> > annotation, then use Full annotation just do gene summary for the
> > second stage analysis? I guess I am bit confused :)
>
> Off the top of my head, here are a few more details:
>
> 1. Do standard FIRMA analysis on 'coreR1' probesets.
>
> 2. Using the same normalized data, fit 'ExonRmaPlm' with  
> 'mergeGroups=FALSE' (

[aroma.affymetrix] Re: Error: cannot allocate vector of size 692.8 Mb

2009-01-29 Thread sabrina

Hi, Mark:
Thanks for the suggestions.
The reason I was using FULL annotation is from what I read of that
FIRMA paper. It mentioned that " if this is detection of novel exons,
then all probesets might be used", so I was just giving it a try to
see what happens. :)
I also noticed that FirmaModel also has 3 different summary methods,
"median" as default, "upperquartile" and "max". Due to curiosity, I
used the other two methods, however I only got NAs for all
trascript_clusters. Notice that in the paper, the authors suggested to
use lower quartile or minimum, I wonder if there is any coding error.
Of course, I have no way to check the code, just wondering...


On Jan 28, 10:12 pm, Mark Robinson  wrote:
> Hi Sabrina.
>
> First of all, you are somewhat in unchartered waters here.  I  
> personally don't recommend using the 'full' CDF for FIRMA analysis.  
> Others can disagree (and I would encourage some discussion about  
> this ...), but my reasoning is that the majority of the probes in the  
> full CDF are querying poorly annotated or predicted exonic regions of  
> the genome.  From what I've seen (on Human Exon 1.0 data), the probe  
> intensities for these are mostly at background and could have an  
> unsavoury effect on the PLM modelling ... and passing that on to  
> FIRMA.  By restricting to core/Ensembl probes, you lessen the effect  
> of non-responding probes.
>
>  From thinking about it (although I haven't actually done on my  
> datasets), I would suggest a two-stage analysis:
>
> 1. PLM/FIRMA analysis on a set of probes in well-annotated regions.

Do you mean to use the coreR1 annotation? I am not quite sure that I
got it.  Do you mean specific chromosome? If that is the case, how can
I pass that to FirmaModel? I don't see any parameter that I can use :(
>
> 2. Differential expression analysis on the remaining probes --  
> strongly differentially expressed ones may be indicative of  
> transcripts/variants that are not covered in the well-annotated set.
>

Again, I am not sure what to define the remaining probes. Can you give
me a hint?
So what I understood is that I could do exon -AS using core
annotation, then use Full annotation just do gene summary for the
second stage analysis? I guess I am bit confused :)

> To answer your question below, the table of FIRMA values shouldn't  
> actually be all that large, so this should be possible.  One thing to  
> check first of all is that you have a relatively clean R session.  Do  
> you have tables/data frames/objects in memory that are consuming a  
> bunch of space?  aroma.affymetrix is memory efficient, but it will  
> need some room to work.  I just ran an example here on the FULL CDF  
> and it appears to tick along just fine, though I can see if the memory  
> spikes.
>
> As always when reporting errors, it doesn't hurt to give the results  
> of 'sessionInfo()', 'traceback()' ... and you could even set  
> 'verbose=-40' (say) in your call to 'extractDataFrame' to see where it  
> all goes wrong.

Thanks for the suggestion! I will start to do that!


Sabrina
>
> Hope that helps.
>
> Cheers,
> Mark
>
> On 28/01/2009, at 3:45 AM, sabrina wrote:
>
>
>
>
>
> > Hi, all:
> > I am using FIRMA model for my Mouse Exon array analysis. I am testing
> > the FULL annotation. After I used FirmaModel and fit it, I used the
> > following code to extract Firma Scores:
> > exFirma<-extractDataFrame(firmaScore,addNames=TRUE);
>
> > however, I got the following error:
>
> > Error: cannot allocate vector of size 692.8 Mb
> > In addition: Warning messages:
> > 1: In getUnitGroupCellMap.FirmaFile(ce, ...) :
> >  Reached total allocation of 1535Mb: see help(memory.size)
>
> > I used memory.size(), it gave me:  903.4713
> > and memory.limit() is:  1535.875
>
> > Does anyone have any suggestion to solve this problem? Thanks!!!
>
> > Sabrina
>
> --
> Mark Robinson
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: m.robin...@garvan.org.au
> e: mrobin...@wehi.edu.au
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
> --
--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Error: cannot allocate vector of size 692.8 Mb

2009-01-27 Thread sabrina

Hi, all:
I am using FIRMA model for my Mouse Exon array analysis. I am testing
the FULL annotation. After I used FirmaModel and fit it, I used the
following code to extract Firma Scores:
exFirma<-extractDataFrame(firmaScore,addNames=TRUE);


however, I got the following error:

Error: cannot allocate vector of size 692.8 Mb
In addition: Warning messages:
1: In getUnitGroupCellMap.FirmaFile(ce, ...) :
  Reached total allocation of 1535Mb: see help(memory.size)


I used memory.size(), it gave me:  903.4713
and memory.limit() is:  1535.875


Does anyone have any suggestion to solve this problem? Thanks!!!

Sabrina
--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: exon array with technical replicates

2009-01-23 Thread sabrina

Hi, Mark:
I did options, it gave me the exact display as what you showed to
me.:)
Here is some of my confusion:
 I checked the exFirma scores of these NAs to see what
transcript_cluster_id they were assigned to so that I could check
whether they do have >5000 probes assigned to it. One of them is :
6824548.  I went back to the file I saved for # of probes per exon to
check how many probes it has. To my surprise, it only has one probeset
and one probes: 6824548.4897698 which matches the probeset_id
(4897698). Then I went ahead to download the affymetrix annotation
file, transcript.csv and probeset.csv. In the transcript.csv file,
6824548 has 17 probes in total, and in the probeset.csv, it shows that
transcript_cluster_id : 6824548 has 5 probesets, 4897698 is one of
them with only one probe assigned to it. I tried the extenedR1 cdf
version, it gave me the same thing (in terms of # of probes per exon,
using nProbesPerExon<-readCdfNbrOfCellsPerUnitGroup(getPathname(cdf))
 I am bit confused. It seemed to me that in the cdf file, some of the
probesets for the gene were missing for this transcript_cluster_id. I
wonder if you can help me out on this . I noticed that you created the
cdf files so you are the right person to ask  :)

The code I generated the # of probes per exon is:
nProbesPerExon<-readCdfNbrOfCellsPerUnitGroup(getPathname(cdf))
nProbesPerExonVector<-unlist(nProbesPerExon)


Thanks and Have a great weekend

Sabrina

On Jan 22, 5:18 pm, Mark Robinson  wrote:
> Hi Sabrina.
>
> See below.
>
> On 23/01/2009, at 3:17 AM, sabrina wrote:
>
>
>
>
>
> > Hi, Mark:
> > Thanks for the suggestions. I think I will go with one of the
> > replicates for now, just to make things simple, later on I will deal
> > with the replicates.
> > Now, here is another problem I have. I used the following code to
> > generate the firma score:
>
> > firma <-FirmaModel(plm);
> > fit(firma,verbose=verbose)
> > firmaScore<- getFirmaScores(firma);
>
> > exFirma<-extractDataFrame(firmaScore,addNames=TRUE);
> > exFirma[,6:ncol(exFirma)]<-log2(exFirma[,6:ncol(exFirma)])
>
> > then when I use limma fit, it gave me warnings:
> > Warning message:
> > In lmFit(exFirma[, 6:ncol(exFirma)], mm) :
> >  Some coefficients not estimable: coefficient interpretation may
> > vary.
>
> > It turned out that there are several rows of exFirma (even before the
> > log2) were NAs. But when I check  the overall exon expressions (use
> > ExonRmaPlm with para: mergeGroups=FALSE), for that specific exon (I
> > take the group name as the exon id here), they have real values. I
> > wonder where went wrong, perhaps fit(firma) step?
>
> I'll make a wager that this is due to a small number of probesets that  
> have a large (>500 say) probes assigned to them.  You are using the  
> Mouse Exon array and I know there are some of these probesets in  
> there.  In the interest of time, aroma.affymetrix changes over to  
> median polish (faster), or skips the probeset altogether, depending on  
> how large.
>
> You can see the default settings in options().  For me (and probably  
> you as well), it is:
>
>  > options("aroma.affymetrix.settings")
> [...]
>
> $aroma.affymetrix.settings$models
> $aroma.affymetrix.settings$models$RmaPlm
> $aroma.affymetrix.settings$models$RmaPlm$medianPolishThreshold
> [1] 500   6
>
> $aroma.affymetrix.settings$models$RmaPlm$skipThreshold
> [1] 5000    1
>
> So, I would just remove these rows before you use lmFit().
>
> Cheers,
> Mark
>
>
>
>
>
> > Thanks!
>
> > Sabrina
>
> > On Jan 20, 7:48 pm, Mark Robinson  wrote:
> >> Hi Sabrina.
>
> >>>> Do you have biological replicates of some samples and technical
> >>>> replicates of others?  Or, just technical replicates of everything?
>
> >>> My experiment has two groups, one has 5 samples (biological
> >>> replicates, that is 5 mice from one strain), and the other has 4
> >>> samples. Among 5 samples of the first group, there are two samples
> >>> hybridized twice to two arrays, so I have 4 arrays for that two
> >>> samples
> >>> ( That is what I meant as technical replicate). Does that make  
> >>> sense?
>
> >> OK, now I get it.  Thanks.
>
> >>>> I suspect it would be difficult to justify adding a bunch of extra
> >>>> (adhoc) steps into the FIRMA pipeline.  I don't have a full
> >>>> understanding of your experiment, but what about just dealing  
> >>>> with it
> >>>> when you operate on the FIRMA scores?  When you say "average on  
> 

[aroma.affymetrix] Re: exon array with technical replicates

2009-01-22 Thread sabrina

Hi, Mark:
Thanks for the suggestions. I think I will go with one of the
replicates for now, just to make things simple, later on I will deal
with the replicates.
Now, here is another problem I have. I used the following code to
generate the firma score:

firma <-FirmaModel(plm);
fit(firma,verbose=verbose)
firmaScore<- getFirmaScores(firma);

exFirma<-extractDataFrame(firmaScore,addNames=TRUE);
exFirma[,6:ncol(exFirma)]<-log2(exFirma[,6:ncol(exFirma)])

then when I use limma fit, it gave me warnings:
Warning message:
In lmFit(exFirma[, 6:ncol(exFirma)], mm) :
  Some coefficients not estimable: coefficient interpretation may
vary.

It turned out that there are several rows of exFirma (even before the
log2) were NAs. But when I check  the overall exon expressions (use
ExonRmaPlm with para: mergeGroups=FALSE), for that specific exon (I
take the group name as the exon id here), they have real values. I
wonder where went wrong, perhaps fit(firma) step?

Thanks!

Sabrina



On Jan 20, 7:48 pm, Mark Robinson  wrote:
> Hi Sabrina.
>
> >> Do you have biological replicates of some samples and technical
> >> replicates of others?  Or, just technical replicates of everything?
>
> > My experiment has two groups, one has 5 samples (biological
> > replicates, that is 5 mice from one strain), and the other has 4
> > samples. Among 5 samples of the first group, there are two samples
> > hybridized twice to two arrays, so I have 4 arrays for that two  
> > samples
> > ( That is what I meant as technical replicate). Does that make sense?
>
> OK, now I get it.  Thanks.
>
>
>
> >> I suspect it would be difficult to justify adding a bunch of extra
> >> (adhoc) steps into the FIRMA pipeline.  I don't have a full
> >> understanding of your experiment, but what about just dealing with it
> >> when you operate on the FIRMA scores?  When you say "average on plm",
> >> I assume this means an average of the chip effects for those two
> >> samples?
>
> > Yes, that is what I meant. By averaging chipEffects, I actually
> > average the gene signals from the two arrays.
>
> >> You could fit a PLM that estimates a single chip effect for
> >> those two samples and use that for calculating FIRMA scores.
>
> > Do you mean to fit a plm for these two arrays for one sample
> > separately from the other samples? If I just fit plm for these two
> > arrays (say sample 1),will it estimate different probe affinity? If it
> > does , then these chipEffects (gene signals to estimate FIRMA score)
> > won't be compatible , am I correct? Thanks!
>
> I'm not suggesting to fit multiple PLMs for the same gene and somehow  
> combine them.  What I'm suggesting is a single PLM, but where there is  
> only 1 chip effect parameter for those 2 samples.  From a design  
> matrix point of view, this is conceptually straightforward.  On the  
> top of my head though, I don't know how to get the 'preprocessCore'  
> code (this is used under the hood in aroma.affymetrix to do the  
> fitting) to fit such a model.  It may have to be a one-off.
>
> Another alternative is to average the FIRMA scores (subsequent to the  
> standard PLM fitting) for these 2 samples and do a 4 versus 4  
> comparison to look for changes in FIRMA scores.
>
> And yet another alternative is to ignore this altogether.  Its  
> unlikely (maybe? feel free to disagree) that 1 technical replicate  
> amongst 8 biological replicates would cause you to underestimate the  
> variability so much as to significantly overstate the changes you  
> see ... but thats just a hunch.  Presumably, you'd also be validating  
> the major discoveries that you make.
>
> Hope that helps.
>
> Mark
--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: exon array with technical replicates

2009-01-20 Thread sabrina

Hi, Mark:



> Do you have biological replicates of some samples and technical  
> replicates of others?  Or, just technical replicates of everything?

My experiment has two groups, one has 5 samples (biological
replicates, that is 5 mice from one strain), and the other has 4
samples. Among 5 samples of the first group, there are two samples
hybridized twice to two arrays, so I have 4 arrays for that two samples
( That is what I meant as technical replicate). Does that make sense?

>
> I suspect it would be difficult to justify adding a bunch of extra  
> (adhoc) steps into the FIRMA pipeline.  I don't have a full  
> understanding of your experiment, but what about just dealing with it  
> when you operate on the FIRMA scores?  When you say "average on plm",  
> I assume this means an average of the chip effects for those two  
> samples?  

Yes, that is what I meant. By averaging chipEffects, I actually
average the gene signals from the two arrays.

>You could fit a PLM that estimates a single chip effect for  
> those two samples and use that for calculating FIRMA scores.

Do you mean to fit a plm for these two arrays for one sample
separately from the other samples? If I just fit plm for these two
arrays (say sample 1),will it estimate different probe affinity? If it
does , then these chipEffects (gene signals to estimate FIRMA score)
won't be compatible , am I correct? Thanks!

Sabrina
--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] exon array with technical replicates

2009-01-19 Thread sabrina

Hi, all:
I am working on Affy Mouse Exon Array . Because of the experiment
design and quality of the hybridization, we have two arrays hybridized
from one mouse (same biological sample). I assume that I should treat
these two arrays as tehcnical duplicates. If that is the case, I could
do background correction, normalization and summary separately for
these two arrays,(RMA, and  ExonRmaPlm), then before I use FIRMA to
get firma scores, can I just do average on plm of these two arrays?
But then what do I feed in FIRMAModel? ( The default one is just plm
results  directly from ExonRmaPlm) .Or any suggestions about how to
deal with this situation? My goal is to find novel splicing events,
but right now I am just using core annotation to try it out. Thanks!

Sabrina
--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: FIRMA score

2008-12-11 Thread sabrina

Hi, Mark!
Thank you so much! After an intensive study of FIRMA and
aroma.affymetrix, I think I sort of figure out a routine for my exon
study.  And now I am more into R programming instead of C or Matlab :)
One side question I still have is the probe affinity effect. I never
used it and just wonder what is it for? Maybe I still don't fully
understand FIRMA model very well :(

Thanks!

Sabrina

On Dec 10, 5:34 pm, Mark Robinson <[EMAIL PROTECTED]> wrote:
> > I assume that you referred me to the probeset annotation csv file ,
> > not the transcript annotation csv file. :)
>
> > I did get the exon coordinates! it seemed to me that the probeset_id
> > in the probeset annotation file is equivalent to groupName of
> > exFirma.
>
> Yes!  The transcript CSV file may be useful for other things, but  
> indeed not for exon coordinates.
>
> In aroma.affymetrix terminology for exon arrays,  
> unit=transcript_cluster_id, group=probeset_id and cell=probe.
>
> > One question I have is in the exon array plot from GenomeGraphs, the
> > unrData is the probe level data, in other words, in my case, I should
> > use the normalized data, csN, is that correct?
>
> If you look closely at the Purdom paper, there are a few things you  
> may want to plot in the context of Ensembl annotation.  The normalized  
> data is the obvious one, which you have linked to an object 'csN'.
>
> In addition, you may want to plot the residuals:
>
> rs <- calculateResiduals(plmTr)  # plmTr is the fitted ExonRmaPlm with  
> mergeGroups=TRUE
> d <- extractMatrix(rs,cells=...)
>
> ... or for example you may want to plot the raw data, adjusted by the  
> probe effects.  For example, RMA fits the model
>
> Y_{ij} = a_i + b_j
>
> (Y_{ij} = normalized data, a_i = chip effects, b_j = probe effects),  
> you may be interested in plotting:
>
> Y_{ij} - \hat{b}_j
>
> ... i.e. raw data, adjusted by the estimated probe effects.
>
> > another question is graph related though. Since I have two groups
> > among my 9 samples, when I plot exon array, is there any way to use
> > different color coding for these samples on the same graph? Thanks!
>
> Short answer:
>
> library(GenomeGraphs)
> ?"ExonArray-class"
>
> Long answer:  create your ExonArray object and use DisplayPars slot to  
> specify colour and line width:
>
> ea1<-new("ExonArray", intensity = d, probeStart = ..., probeEnd=...,
>               probeId = ..., nProbes = ..., dp = DisplayPars(color =  
> col, lwd=lwd,
>               mapColor = "dodgerblue2",plotMap=TRUE))
>
> where 'col' and 'lwd' are either length 1 (in which case all lines get  
> the same width and colour) or vectors of the length of the number of  
> samples ...
>
> Thanks for all these questions Sabrina.  This will help make a nice  
> vignette ... when I get time :)
>
> Unless you'd be interested in summarizing your approach! :)
>
> Cheers,
> Mark
>
> > Sabrina
>
> > On Dec 9, 5:02 pm, Mark Robinson <[EMAIL PROTECTED]> wrote:
> >> Hi Sabrina.
>
> >> Great work!  See below.
>
> >> On 10/12/2008, at 5:27 AM, sabrina wrote:
>
> >>> Hi, Mark!
> >>> Thank you so much! I think I pretty much figured out how to get the
> >>> gene level, exon level expressions and comparison done. I checked  
> >>> the
> >>> GenomeGraphs as you suggested. I tried the code in the exon array
> >>> section, and it worked fine. So here is the question related to my
> >>> data set.
>
> >>> In order to plot exon data, I used the following code to get exon
> >>> summary (intensities) which is pbsetSummLog2:
>
> >>> plmNoMerge<- ExonRmaPlm(csN, mergeGroups=FALSE)
> >>> fit(plmNoMerge)
> >>> readUnits(plmNoMerge,units=1)
>
> >>> chpNoMerge<-getChipEffectSet(plmNoMerge)
>
> >>> pbsetSumm<-extractMatrix(chpNoMerge,returnUgcMap=TRUE)
> >>> pbsetSummLog2<-log2(pbsetSumm);
> >>> pbsetNames<-readCdfGroupNames(getPathname(getCdf(chpNoMerge)),
> >>>    unit=unique(attr(pbsetSumm,"unitGroupCellMap")[,"unit"]))
> >>> rownames(pbsetSumm)<-unlist(pbsetNames)
> >>> rownames(pbsetSummLog2)<-unlist(pbsetNames)
>
> >>> Because each significant exon that was detected by FIRMA is  
> >>> associated
> >>> with one gene, and that gene has several exons, therefore, I used  
> >>> the
> >>> following to find all exons associated to that gene:
> >>> (x is the result fr

[aroma.affymetrix] Re: FIRMA score

2008-12-10 Thread sabrina

Thanks! Mark:
I assume that you referred me to the probeset annotation csv file ,
not the transcript annotation csv file. :)
I did get the exon coordinates! it seemed to me that the probeset_id
in the probeset annotation file is equivalent to groupName of
exFirma.

One question I have is in the exon array plot from GenomeGraphs, the
unrData is the probe level data, in other words, in my case, I should
use the normalized data, csN, is that correct?

another question is graph related though. Since I have two groups
among my 9 samples, when I plot exon array, is there any way to use
different color coding for these samples on the same graph? Thanks!

Sabrina

On Dec 9, 5:02 pm, Mark Robinson <[EMAIL PROTECTED]> wrote:
> Hi Sabrina.
>
> Great work!  See below.
>
> On 10/12/2008, at 5:27 AM, sabrina wrote:
>
>
>
>
>
> > Hi, Mark!
> > Thank you so much! I think I pretty much figured out how to get the
> > gene level, exon level expressions and comparison done. I checked the
> > GenomeGraphs as you suggested. I tried the code in the exon array
> > section, and it worked fine. So here is the question related to my
> > data set.
>
> > In order to plot exon data, I used the following code to get exon
> > summary (intensities) which is pbsetSummLog2:
>
> > plmNoMerge<- ExonRmaPlm(csN, mergeGroups=FALSE)
> > fit(plmNoMerge)
> > readUnits(plmNoMerge,units=1)
>
> > chpNoMerge<-getChipEffectSet(plmNoMerge)
>
> > pbsetSumm<-extractMatrix(chpNoMerge,returnUgcMap=TRUE)
> > pbsetSummLog2<-log2(pbsetSumm);
> > pbsetNames<-readCdfGroupNames(getPathname(getCdf(chpNoMerge)),
> >    unit=unique(attr(pbsetSumm,"unitGroupCellMap")[,"unit"]))
> > rownames(pbsetSumm)<-unlist(pbsetNames)
> > rownames(pbsetSummLog2)<-unlist(pbsetNames)
>
> > Because each significant exon that was detected by FIRMA is associated
> > with one gene, and that gene has several exons, therefore, I used the
> > following to find all exons associated to that gene:
> > (x is the result from topTable)
>
> > temp<-grep(exFirma[x$ID,1][1],exFirma[,1]);
> > gp_temp<-exFirma[temp,2]
>
> > gene1<-pbsetSummLog2[temp,]
>
> > which gives me:
> >                  array1              array 2 ...
> > 4308385  6.338784        6.896304
> > 4376965  1.973171        2.272406
>
> > I know that "430835" is the groupName (aka exon id), but I am not sure
> > where to find the start and end position of these individual exons,
> > can you show me how to do it?
>
> > The reason I asked this is because in makeExonArray, I need the
> > probeStart and probeEnd positions. Thanks!!!
>
> Well, you can get the probeset start and end positions from the  
> 'NetAffx Annotation Files' from Affymetrix.  For human, this would be  
> at:
>
> http://www.affymetrix.com/products_services/arrays/specific/exon.affx...
>
> I think you said you were using mouse, so you can find this at:
>
> http://www.affymetrix.com/products_services/arrays/specific/mouse_exo...
>
> Find the 'Current NetAffx Annotation Files' section and download the  
> csv.zip file.  Just to have a quick peak at a few columns of this  
> file, if I run a unix tool called awk on the CSV file you get:
>
> 
> awk '{FS=","; print $7,$1,$2,$3,$4,$5,$16}' HuEx-1_0-st-
> v2.na25.hg18.probeset.csv | grep -v "^ " | more
> 
>
> "transcript_cluster_id" "probeset_id" "seqname" "strand" "start"  
> "stop" "level"
> ...
> "2315373" "2315374" "chr1" "+" "742655" "742719" "core"
> "2315373" "2315375" "chr1" "+" "742869" "743231" "core"
> "2315373" "2315376" "chr1" "+" "743293" "743434" "core"
> "2315373" "2315377" "chr1" "+" "744094" "744979" "core"
> "2315380" "2315381" "chr1" "+" "752897" "752974" "extended"
> "2315380" "2315382" "chr1" "+" "754246" "754274" "extended"
> ...
>
> so there you have everything you need -- identifiers and chromosome  
> locations.  You can just use 'read.csv' to read it into R and pull off  
> the coordinates you need.
>
> Cheers,
> Mark
>
>
>
> > Sabrina
> > On Dec 8, 4:54 pm, Mark Robinson <[EMAIL PR

[aroma.affymetrix] Re: FIRMA score

2008-12-09 Thread sabrina

Hi, Mark!
Thank you so much! I think I pretty much figured out how to get the
gene level, exon level expressions and comparison done. I checked the
GenomeGraphs as you suggested. I tried the code in the exon array
section, and it worked fine. So here is the question related to my
data set.

In order to plot exon data, I used the following code to get exon
summary (intensities) which is pbsetSummLog2:

plmNoMerge<- ExonRmaPlm(csN, mergeGroups=FALSE)
fit(plmNoMerge)
readUnits(plmNoMerge,units=1)

chpNoMerge<-getChipEffectSet(plmNoMerge)

pbsetSumm<-extractMatrix(chpNoMerge,returnUgcMap=TRUE)
pbsetSummLog2<-log2(pbsetSumm);
pbsetNames<-readCdfGroupNames(getPathname(getCdf(chpNoMerge)),
unit=unique(attr(pbsetSumm,"unitGroupCellMap")[,"unit"]))
rownames(pbsetSumm)<-unlist(pbsetNames)
rownames(pbsetSummLog2)<-unlist(pbsetNames)

Because each significant exon that was detected by FIRMA is associated
with one gene, and that gene has several exons, therefore, I used the
following to find all exons associated to that gene:
(x is the result from topTable)

temp<-grep(exFirma[x$ID,1][1],exFirma[,1]);
gp_temp<-exFirma[temp,2]

gene1<-pbsetSummLog2[temp,]

which gives me:
  array1  array 2 ...
4308385  6.3387846.896304
4376965  1.9731712.272406

I know that "430835" is the groupName (aka exon id), but I am not sure
where to find the start and end position of these individual exons,
can you show me how to do it?

The reason I asked this is because in makeExonArray, I need the
probeStart and probeEnd positions. Thanks!!!

Sabrina
On Dec 8, 4:54 pm, Mark Robinson <[EMAIL PROTECTED]> wrote:
> Hi Sabrina.
>
> On 08/12/2008, at 3:25 PM, sabrina wrote:
>
>
>
>
>
> > Hi, Mark:
> > Thanks for the help! I think now I have better understanding of LIMMA
> > now after I read the related chapter of LIMMA user's guide :)
>
> > On Dec 4, 4:10 pm, "Mark Robinson" <[EMAIL PROTECTED]> wrote:
> >> Sabrina.
>
> >> When asking questions like this, please post your full set of  
> >> commands.
> >> Its hard to decode what variables have been set without this. :)
>
> >> I'm guessing that affyAnnotation is what came from the 'read.csv'  
> >> call and
> >> x2 is the output of topTable?  Have you set the rownames of exFirma?
>
> > x2 is the output of topTable
> > I do not know how to set rownames of exFirma, when I check exFirma
> > [1:3,1:6], it gave me the following:
>
> > unitName groupName unit group cell    Array1
> > 1  6838637   4304927    1     1    1  1.0372756
> > 2  6838637   4330595    1     2    2 -1.1679380
> > 3  6838637   4356771    1     3    3  1.2217411
>
> > Can you show me how to set the rownames? where can I find the row
> > names? Is it correct that groupName is the exon id ?
>
> A good place to start:
>
> ?rownames
>
>
>
> > I used the following code to import affymetrix annotation file:
>
> > affyAnnotation<- read.csv(file="MoEx-1_0-st-
> > v1.na27.mm9.transcript.csv",head=TRUE,sep=",",skip=20);
>
> > and
>
> >> Is this just a typo then?  It should be
> >> affyAnnotation$transcript_cluster_id instead of
> >> affyAnnotation$transcript_cluster_Id.
>
> >> What does exFirma[x2$ID,1] actually give you?  If the  
> >> rownames(exFirma)
> >> are not set, this could mislead you.
>
> > I used the suggestion you gave to me to redo the lmFit, and here is
> > what I used :
>
> > mm <- model.matrix(~cls)
> > fit <- lmFit(exFirma[,6:ncol(exFirma)], mm)
> > fit<-eBayes(fit)
> > x<-topTable(fit,coef=2,adj="fdr")
>
> > Then I used exFirma[x$ID,1], it gave me:
> > [1] "6996667" "6965348" "6982181" "6917790" "6899766" "7005797"
> > "6880577" "6962054" "6897349"
> > [10] "7018774"
>
> > when I printed x, it gave me:
>
> >           ID     logFC         t      P.Value adj.P.Val         B
> > 11253   11253 -4.856783 -8.524568 2.223806e-06 0.4982258 -4.323317
> > 183160 183160  4.058556  7.676406 6.413325e-06 0.7184271 -4.332955
> > 32235   32235 -3.692429 -6.555475 2.963704e-05 1.000 -4.350176
> > 132610 132610  2.315536  6.508698 3.170265e-05 1.000 -4.351036
> > 165279 165279  3.269404  6.311540 4.225007e-05 1.000 -4.354809
> > 171777 171777  3.845620  6.174738 5.172731e-05 1.000 -4.357574
> > 45380   45380  1.971388  6.088251 5.886544e-05 1.000 -4.359388
> > 132873 132873 -4.569042 -6.021868 6.505114e-0

[aroma.affymetrix] Re: FIRMA score

2008-12-07 Thread sabrina

Hi, Mark:
Thanks for the help! I think now I have better understanding of LIMMA
now after I read the related chapter of LIMMA user's guide :)


On Dec 4, 4:10 pm, "Mark Robinson" <[EMAIL PROTECTED]> wrote:
> Sabrina.
>
> When asking questions like this, please post your full set of commands.
> Its hard to decode what variables have been set without this. :)
>
> I'm guessing that affyAnnotation is what came from the 'read.csv' call and
> x2 is the output of topTable?  Have you set the rownames of exFirma?

x2 is the output of topTable
I do not know how to set rownames of exFirma, when I check exFirma
[1:3,1:6], it gave me the following:

 unitName groupName unit group cellArray1
1  6838637   43049271 11  1.0372756
2  6838637   43305951 22 -1.1679380
3  6838637   43567711 33  1.2217411


Can you show me how to set the rownames? where can I find the row
names? Is it correct that groupName is the exon id ?


I used the following code to import affymetrix annotation file:

affyAnnotation<- read.csv(file="MoEx-1_0-st-
v1.na27.mm9.transcript.csv",head=TRUE,sep=",",skip=20);

and
>
> Is this just a typo then?  It should be
> affyAnnotation$transcript_cluster_id instead of
> affyAnnotation$transcript_cluster_Id.
>
> What does exFirma[x2$ID,1] actually give you?  If the rownames(exFirma)
> are not set, this could mislead you.

I used the suggestion you gave to me to redo the lmFit, and here is
what I used :

mm <- model.matrix(~cls)
fit <- lmFit(exFirma[,6:ncol(exFirma)], mm)
fit<-eBayes(fit)
x<-topTable(fit,coef=2,adj="fdr")


Then I used exFirma[x$ID,1], it gave me:
 [1] "6996667" "6965348" "6982181" "6917790" "6899766" "7005797"
"6880577" "6962054" "6897349"
[10] "7018774"

when I printed x, it gave me:

   ID logFC t  P.Value adj.P.Val B
11253   11253 -4.856783 -8.524568 2.223806e-06 0.4982258 -4.323317
183160 183160  4.058556  7.676406 6.413325e-06 0.7184271 -4.332955
32235   32235 -3.692429 -6.555475 2.963704e-05 1.000 -4.350176
132610 132610  2.315536  6.508698 3.170265e-05 1.000 -4.351036
165279 165279  3.269404  6.311540 4.225007e-05 1.000 -4.354809
171777 171777  3.845620  6.174738 5.172731e-05 1.000 -4.357574
45380   45380  1.971388  6.088251 5.886544e-05 1.000 -4.359388
132873 132873 -4.569042 -6.021868 6.505114e-05 1.000 -4.360815
86037   86037  3.206732  5.890372 7.943255e-05 1.000 -4.363738
27761   27761 -2.651559 -5.825434 8.774484e-05 1.000 -4.365229



when I typed exFirma[11253,], it gave me:

 exFirma[11253,]
  unitName groupName unit group  cell   array1, ...
11253  6996667   5092691  535 6 11253 2.718783 ...

so it appeared to me that x$ID matches the index of exFirma entry.

Then I used;
 index<-match(exFirma[x$ID,1], affyAnnotation$transcript_cluster_id)
 to find the index of these entries (significant different between two
groups) in Affy annotation, then I checked the entries in Affy
Annotation file:
 affyAnnotation[index, 1:5]

transcript_cluster_id probeset_id  seqname strand start
71605   6996667 6996667 chr9  -  70631116
62004   6965348 6965348 chr7  + 151009070
67351   6982181 6982181 chr8  -  47660950
...


But then I don't know how to check whether it is correct or not , or
how to find and plot corresponding probeset (exon ) as what was shown
in the FIRMA paper supplementary . Can you give me some hints? Thanks!

Sabrina

>
> Cheers,
> Mark
>
>
>
> > Ok, I just used the following line to read Affy Annotation
>
> >  read.csv(file="MoEx-1_0-st-
> > v1.na27.mm9.transcript.csv",head=TRUE,sep=",",skip=20)
>
> > then I used
> >  match(exFirma[x2$ID,1],affyAnnotation$transcript_cluster_Id)
>
> > but I only got NAs
>
> > I can not figure out what went wrong because I checked the first ID,
> > it is in Affy annotation file. Can you point out what I did
> > incorrectly? Thanks !
>
> > Sabrina
>
> > On Dec 3, 4:53 pm, Mark Robinson <[EMAIL PROTECTED]> wrote:
> >> Sabrina.
>
> >> See below.
>
> >> On 04/12/2008, at 8:41 AM, sabrina wrote:
>
> >> > Hi, Mark:
>
> >> > Thank you so much for the detailed explanation!
> >> > Actually I am working on Mouse Exon arrays.  I did check the UCSC
> >> > genome browser, but they do not have the Exon Array available  as
> >> > Human Exon.
>
> >> Right! Good point.
>
> >> > My study is trying to find alternative splicing events between two
> >> > grou

[aroma.affymetrix] Re: FIRMA score

2008-12-04 Thread sabrina

Ok, I just used the following line to read Affy Annotation

 read.csv(file="MoEx-1_0-st-
v1.na27.mm9.transcript.csv",head=TRUE,sep=",",skip=20)

then I used
 match(exFirma[x2$ID,1],affyAnnotation$transcript_cluster_Id)

but I only got NAs

I can not figure out what went wrong because I checked the first ID,
it is in Affy annotation file. Can you point out what I did
incorrectly? Thanks !

Sabrina

On Dec 3, 4:53 pm, Mark Robinson <[EMAIL PROTECTED]> wrote:
> Sabrina.
>
> See below.
>
> On 04/12/2008, at 8:41 AM, sabrina wrote:
>
> > Hi, Mark:
>
> > Thank you so much for the detailed explanation!
> > Actually I am working on Mouse Exon arrays.  I did check the UCSC
> > genome browser, but they do not have the Exon Array available  as
> > Human Exon.
>
> Right! Good point.
>
>
>
> > My study is trying to find alternative splicing events between two
> > groups with identical genotypes but different treatments. each group
> > has 4 or 5 biological replicates. So I did all as described in the
> > examples from Human Exon array, then used the following codes to find
> > the significant genes with potential splicing events:
>
> > ...
> > exFirma<-extractDataFrame(firmaScore,addNames=TRUE);
> > exFirma[,6:ncol(exFirma)]<-log2(exFirma[,6:ncol(exFirma)])
>
> > library(limma)
>
> > design <- cbind(Grp1=c(1,1,1,1,1,0,0,0,0),Grp2=c(0,0,0,0,0,1,1,1,1))
>
> > fit<-lmFit(exFirma[,6:ncol(exFirma)],design)
> > fit<-eBayes(fit)
> > x<-topTable(fit)
>
> Careful here as to what you are fitting and testing with limma.  You  
> are fitting a parameterization that would require a contrast to be fit  
> ( ... such as 'contrasts.fit' ... ) in order to assess the differences  
> between your 2 groups.  Did you get this example from a web page or a  
> thread somewhere?  If so, I should correct it.
>
> Probably what you want is:
>
> design <- cbind(Grp1=1,Grp2=c(0,0,0,0,0,1,1,1,1))
>
> ...
>
> topTable(fit,coef=2)
>
> Under this parameterization, setting coef=2 will test the difference  
> between the 2 groups.  Note that topTable by default gives the top 10  
> values, regardless of statistics.
>
> Does that make sense?
>
> > However I only got about 10 IDs with all of which adjusted P values as
> > 1. I wonder if there is anything I did wrong here. Also when I checked
> > the IDs back to Affymetrix annotation file, I could only find one
> > match, I wonder what went wrong.
>
> How exactly did you match them back?
>
> Mark
>
>
>
> > Thanks!
>
> > BTW, I used the coreR1 , A20080718, MR, cdf version
>
> > Sabrina
>
> > On Dec 2, 6:27 pm, Mark Robinson <[EMAIL PROTECTED]> wrote:
> >> Hi Sabrina.
>
> >> Indeed, you'll want to set 'mergeGroups=TRUE' for the probe level
> >> model (ExonRmaPlm object) that you send to 'FirmaModel' ...
>
> >> -http://groups.google.com/group/aroma-affymetrix/web/human-exon-array-
> >> ...
> >>   says:
> >> ...
> >> plmTr <- ExonRmaPlm(csN, mergeGroups=TRUE)
> >> print(plmTr)
> >> ...
> >> firma <- FirmaModel(plmTr)
> >> fit(firma, verbose=verbose)
> >> -
>
> >> NO, the unitName is not the same as gene id in NCBI.  If you are  
> >> using
> >> the CDF from:
>
> >>http://groups.google.com/group/aroma-affymetrix/web/affymetrix-
> >> define...
>
> >> (e.g. HuEx-1_0-st-v2,coreR3,A20071112,EP.cdf) the unitName is an
> >> Affymetrix-defined identifier.
>
> >> If you go to:
>
> >>http://www.affymetrix.com/products_services/arrays/specific/
> >> exon.affx...
>
> >> you will see a 'HuEx-1_0-st-v2 Probeset Annotations' file that you  
> >> can
> >> download.  In there, you will find the annotations you need (unitName
> >> corresponds to the 'transcript_cluster_id' column and groupName
> >> corresponds to the 'probeset_id' column)
>
> >> Also, you may be interested to know that the UCSC genome browser  
> >> has a
> >> track called 'Affy HuEx 1.0' under the set of 'Expression' tracks.
> >> Unfortunately, its not searchable on the browser, but quite useful if
> >> you can locate it in the browser and find out which exon is
> >> differentially spliced.
>
> >> Hope that helps.
> >> Mark
>
> >> On 03/12/2008, at 5:56 AM, sabrina wrote:
>
> >>> Hi, all:
> >&g

[aroma.affymetrix] Re: FIRMA score

2008-12-04 Thread sabrina

Hi, Mark:
Thanks again, here comes more questions :)

I got the example from one of the threads here, the original post was:

library(limma)
fit<-lmFit(exFirma[,6:ncol(exFirma)], model.matrix(~
your.class.variable))
fit<-eBayes(fit)

I thought I should be ok to define the design , then fit to it. Am I
wrong? Or if I want to use model.matrix, which variable should I use
in terms of firmaScore? I am bit confused. . I also looked at LIMMA
userguide, so I figured maybe I can do this:

contrast.matrix<-makeContrasts(group1-group2,levels=design)
fit2<- contrasts.fit(fit,contrast.matrix)
fit2<-eBayes(fit2)
x2<-topTable(fit2)


but when I ran it, it gave me warning as follows:

Warning message:
In contrasts.fit(fit, contrast.matrix) :
  row names of contrasts don't match col names of coefficients


Can you tell me what I did wrong here?


> Careful here as to what you are fitting and testing with limma.  You  
> are fitting a parameterization that would require a contrast to be fit  
> ( ... such as 'contrasts.fit' ... ) in order to assess the differences  
> between your 2 groups.  Did you get this example from a web page or a  
> thread somewhere?  If so, I should correct it.
>
> Probably what you want is:
>
> design <- cbind(Grp1=1,Grp2=c(0,0,0,0,0,1,1,1,1))
>
> ...
>
> topTable(fit,coef=2)
>
> Under this parameterization, setting coef=2 will test the difference  
> between the 2 groups.  Note that topTable by default gives the top 10  
> values, regardless of statistics.
>
> Does that make sense?
>
> > However I only got about 10 IDs with all of which adjusted P values as
> > 1. I wonder if there is anything I did wrong here. Also when I checked
> > the IDs back to Affymetrix annotation file, I could only find one
> > match, I wonder what went wrong.
>
> How exactly did you match them back?


once I got x as the topTable, I used exFirma[x$ID,] to see what are
the top lists, it showed me ID with unitName and Group Name. So I did
manually, using unitName to find a match in AffyAnnotation file. Or
should I use GroupName instead? One other problem is that Excel only
can not display all rows of the annotation file, so some of them I
can't not search. Is there any easy way to do it in R and display it
in R?  I usually use Matlab, so I am very unfamiliar with R.

Thank you so much!

Sabrina


--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] Re: FIRMA score

2008-12-03 Thread sabrina

Hi, Mark:

Thank you so much for the detailed explanation!
Actually I am working on Mouse Exon arrays.  I did check the UCSC
genome browser, but they do not have the Exon Array available  as
Human Exon.

My study is trying to find alternative splicing events between two
groups with identical genotypes but different treatments. each group
has 4 or 5 biological replicates. So I did all as described in the
examples from Human Exon array, then used the following codes to find
the significant genes with potential splicing events:


...
exFirma<-extractDataFrame(firmaScore,addNames=TRUE);
exFirma[,6:ncol(exFirma)]<-log2(exFirma[,6:ncol(exFirma)])


library(limma)

design <- cbind(Grp1=c(1,1,1,1,1,0,0,0,0),Grp2=c(0,0,0,0,0,1,1,1,1))

fit<-lmFit(exFirma[,6:ncol(exFirma)],design)
fit<-eBayes(fit)
x<-topTable(fit)

However I only got about 10 IDs with all of which adjusted P values as
1. I wonder if there is anything I did wrong here. Also when I checked
the IDs back to Affymetrix annotation file, I could only find one
match, I wonder what went wrong. Thanks!

BTW, I used the coreR1 , A20080718, MR, cdf version

Sabrina

On Dec 2, 6:27 pm, Mark Robinson <[EMAIL PROTECTED]> wrote:
> Hi Sabrina.
>
> Indeed, you'll want to set 'mergeGroups=TRUE' for the probe level  
> model (ExonRmaPlm object) that you send to 'FirmaModel' ...
>
> -http://groups.google.com/group/aroma-affymetrix/web/human-exon-array-...
>   says:
> ...
> plmTr <- ExonRmaPlm(csN, mergeGroups=TRUE)
> print(plmTr)
> ...
> firma <- FirmaModel(plmTr)
> fit(firma, verbose=verbose)
> -
>
> NO, the unitName is not the same as gene id in NCBI.  If you are using  
> the CDF from:
>
> http://groups.google.com/group/aroma-affymetrix/web/affymetrix-define...
>
> (e.g. HuEx-1_0-st-v2,coreR3,A20071112,EP.cdf) the unitName is an  
> Affymetrix-defined identifier.
>
> If you go to:
>
> http://www.affymetrix.com/products_services/arrays/specific/exon.affx...
>
> you will see a 'HuEx-1_0-st-v2 Probeset Annotations' file that you can  
> download.  In there, you will find the annotations you need (unitName  
> corresponds to the 'transcript_cluster_id' column and groupName  
> corresponds to the 'probeset_id' column)
>
> Also, you may be interested to know that the UCSC genome browser has a  
> track called 'Affy HuEx 1.0' under the set of 'Expression' tracks.  
> Unfortunately, its not searchable on the browser, but quite useful if  
> you can locate it in the browser and find out which exon is  
> differentially spliced.
>
> Hope that helps.
> Mark
>
> On 03/12/2008, at 5:56 AM, sabrina wrote:
>
>
>
> > Hi, all:
> > I just started learning to use firma analysis. My impression from
> > previous discussions of other users is that we should only use plm
> > with mergeGroup=True, is that correct? Another question is : Is
> > unitName same as gene uid in NCBI database? how can I map groupName to
> > EXON name from NCBI? Thanks !!!
>
> > Sabrina
>
> --
> Mark Robinson
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: [EMAIL PROTECTED]
> e: [EMAIL PROTECTED]
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
> --
--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] FIRMA score

2008-12-02 Thread sabrina

Hi, all:
I just started learning to use firma analysis. My impression from
previous discussions of other users is that we should only use plm
with mergeGroup=True, is that correct? Another question is : Is
unitName same as gene uid in NCBI database? how can I map groupName to
EXON name from NCBI? Thanks !!!

Sabrina
--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---



[aroma.affymetrix] plotRle

2008-10-31 Thread sabrina

Hello:
I am new to R and aroma. I just tried to run the sample scripts to
make sure everything works as it was described before I start to
analyze my exon array data.  When I tried plotRle, I found that at the
figure (boxplot) occupied all most all page, but the xlabel was cut
off (only show several chars. Does anyone know how to control
parameters, such as font size, ylim etc in plotRle? Thanks!

Sabrina
--~--~-~--~~~---~--~~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~--~~~~--~~--~--~---