[aroma.affymetrix] MoGene ST 1.0 CDF file
Hello: I used PdInfo2Cdf to create my own CDF for Mouse Gene ST 1.0 . However, when I compared with r3 version available from aroma.affymetrix, I see a small difference: from r3 version AffymetrixCdfFile: Path: annotationData/chipTypes/MoGene-1_0-st-v1 Filename: MoGene-1_0-st-v1,r3.cdf Filesize: 17.63MB Chip type: MoGene-1_0-st-v1,r3 RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 1050x1050 Number of cells: 1102500 Number of units: 35512 Cells per unit: 31.05 Number of QC units: 1 from my own oversion AffymetrixCdfFile: Path: annotationData/chipTypes/MoGene-1_0-st-v1 Filename: MoGene-1_0-st-v1,pd.cdf Filesize: 17.78MB Chip type: MoGene-1_0-st-v1,pd RAM: 0.00MB File format: v4 (binary; XDA) Dimension: 1050x1050 Number of cells: 1102500 Number of units: 35556 Cells per unit: 31.01 Number of QC units: 0 .. so the number of units is different, so is the number of QC units Can someone tell me what the number of QC units is and how did mine miss that? Thanks -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
[aroma.affymetrix] exon and psr
Hello: I used FIRMA to detect AS events. I have noticed that in some cases, one exon were represented by multiple psrs, and only one of them was detected as AS event. in this case, I would suspect that this particular AS event is not real. However, in terms of coding, is there any easy way to map psrs to exon and detect AS based on exon instead of psrs? The problem arises when affy has multiple psrs mapped to a region where several transcripts have one or multiple exons in that region. Any suggestions appreciated! Sabrina -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
[aroma.affymetrix] Firma score help
Hello, all: I have 16 exon arrays from 4 groups, A1, A2 and B1, B2. As are genetically identical but with different treatment,Bs are genetically identical (but different from As) with different treatment 1 and 2. I am interested in finding alternative splicing events that was affected by treatment on A and B and also by genetics (under same treatement). Therefore, the comparison I am interested are A1 vs. A2, B1vsB2, A1 vs B1 (only treatment 1) . My question is, when I do RMA and calculate the FIRMA score, do I use all 16 array I have to get Firma scores, then use limma as suggested before to apply to design matrix and contrast matrix? Or should I for each comparison , do RMA and FIRMA score for exon arrays only involved in that comparison? Thanks for your input! Sabrina -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/
[aroma.affymetrix] using LmFit to FIRMA score
Hi, all A quick question about using lmFit on firma scores from exon array analysis. According to the vignettes, to detect splicing events, we need to use lmFit to find the significant event as following (my code so far): fit <- lmFit(exFirma[,3:ncol(exFirma)] ,design) fit2<-contrasts.fit(fit,cont.matrix); fit2<-eBayes(fit2,proportion=p); topList<-topTable(fit2,coef=contrast[k],adj="fdr"); What this does, to my understanding, is to find firma scores that are significantly different between your contrast groups, say one group is all negative, the other is all positive. But we know exon splicing event is not usually 100%, so it is very likely some samples in one group have -4, -3, and the others in the same group might be 1 or 0.5. but if that is the case, since lmFit basically uses the moderated t- test, will it detect this as significant? If what I just described is reasonable, then will I be better off if I just use the m=median(firma score) for each group and rank the the differences of my contrast group (m1-m2) and find out the most potential splicing events? I guess I am a little bit confused. Thanks Sabrina -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
[aroma.affymetrix] Re: FIRMA SCORE from different test
Hi, Mark: Thanks for the reply. I actually ended up with the #1 as you suggested. The reason I wanted to do this comparison is that I know by adding more data sets, the firma score will be different since it takes into consideration of each sample and each probe of exon of interest. What is interesting is that, after I used #1 dataset and computed firma score, I did another experiment, one is using all firma scores, made design matrix and contrast matrix with all samples, trying to find the top list of comparison A. The other one is to only use firma scores that only involve the comparison A, made design matrix and contrast matrix with only samples involved in comparison A, and got result. As you can imagine, the p values, t values for each exon was quite different. As I digged more into it, I realized that the degree of freedom of residuals for these two tests were quite different, the first one is much bigger than the second dof because it was based on all samples. So I wonder if by including all samples, I accidentally increased significance of exon splicing event? Thanks! Sabrina On Jan 11, 12:08 am, Mark Robinson wrote: > Hi Sabrina. > > Some comments below. > > On 24-Dec-09, at 2:39 AM, sabrina wrote: > > > Hi, everyone: > > I am trying to use aroma to detect splicing events. My dataset > > consists 3 groups (genetically different) with one as control, the > > other two as mutants. each group also had control and treatment > > subgroups. My interest is to compare mutant with control, under normal > > condition,and compare control under two condition (normal and > > treatment), and interactions among mutant and control with condition. > > > I did two different runs for the second comparison (control group > > under two conditions): 1. using all groups , all conditions to do > > normalization, plmFit, firmaScore and use limFit with specific > > contrast matrix to find the splicing events. 2. only use control > > group, under two conditions, do normalization, plmFit, firmaScore, and > > limFit to find the splicing events. > > I'm not sure whether such a comparison will lead you to anything > meaningful. It seems like the best approach "should" be to use all > data together, since that should allow the best estimates of probe > effects to be estimated. Your #1 and #2 don't seem comparable to me. > > > I compared the results from these two runs, they were quite different > > and B values from the topTable were very different. I wonder what is > > the right choice to fit my objective. Thanks > > If you did this with gene expression data in limma, you'd probably > find the same thing -- process the data in different ways and you'll > get different B values. > > Cheers, > Mark > > > > > Sabrina > > > -- > > When reporting problems on aroma.affymetrix, make sure 1) to run the > > latest version of the package, 2) to report the output of > > sessionInfo() and traceback(), and 3) to post a complete code example. > > > You received this message because you are subscribed to the Google > > Groups "aroma.affymetrix" group. > > To post to this group, send email to aroma-affymetrix@googlegroups.com > > To unsubscribe from this group, send email to > > aroma-affymetrix-unsubscr...@googlegroups.com > > For more options, visit this group > > athttp://groups.google.com/group/aroma-affymetrix?hl=en > > -- > Mark Robinson, PhD (Melb) > Epigenetics Laboratory, Garvan > Bioinformatics Division, WEHI > e: m.robin...@garvan.org.au > e: mrobin...@wehi.edu.au > p: +61 (0)3 9345 2628 > f: +61 (0)3 9347 0852 > -- > > __ > The information in this email is confidential and intended solely for the > addressee. > You must not disclose, forward, print or use it without the permission of the > sender. > __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
[aroma.affymetrix] Re: firma score comparison
Hi, Mark: Thanks! So I guess using adjusted p value to cut off the significant events will be a better choice than using B values? Sabrina On Jan 11, 12:19 am, Mark Robinson wrote: > Hi Sabrina. > > This is really a limma question, not an aroma.affymetrix one, but I'll > take a quick stab. Basically, you'll notice that regardless of the > value you set for 'p', you'll get the same t-statistics and p-values. > It only changes the B values (posterior log odds). > > Cheers, > Mark > > On 9-Jan-10, at 1:14 AM, sabrina wrote: > > > > > Hi, all: > > A quick question. I know after we extract frima score, I should use > > eBayes to do the following: > > > assume we have design matrix and contrast matrix, > > fit<-lmFit(exFirma,design) > > fit2<-contrasts.fit(fit,cont.matrix); > > fit2<-eBayes(fit2,proportion=p); > > > the default setting for p is 0.01 that is for gene expression. Is 0.01 > > also a good guess for this firma score purpose? Or should I use 0.001 > > or some other values? Thanks! > > > Sabrina > > -- > > When reporting problems on aroma.affymetrix, make sure 1) to run the > > latest version of the package, 2) to report the output of > > sessionInfo() and traceback(), and 3) to post a complete code example. > > > You received this message because you are subscribed to the Google > > Groups "aroma.affymetrix" group. > > To post to this group, send email to aroma-affymetrix@googlegroups.com > > To unsubscribe from this group, send email to > > aroma-affymetrix-unsubscr...@googlegroups.com > > For more options, visit this group > > athttp://groups.google.com/group/aroma-affymetrix?hl=en > > -- > Mark Robinson, PhD (Melb) > Epigenetics Laboratory, Garvan > Bioinformatics Division, WEHI > e: m.robin...@garvan.org.au > e: mrobin...@wehi.edu.au > p: +61 (0)3 9345 2628 > f: +61 (0)3 9347 0852 > -- > > __ > The information in this email is confidential and intended solely for the > addressee. > You must not disclose, forward, print or use it without the permission of the > sender. > __ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
[aroma.affymetrix] firma score comparison
Hi, all: A quick question. I know after we extract frima score, I should use eBayes to do the following: assume we have design matrix and contrast matrix, fit<-lmFit(exFirma,design) fit2<-contrasts.fit(fit,cont.matrix); fit2<-eBayes(fit2,proportion=p); the default setting for p is 0.01 that is for gene expression. Is 0.01 also a good guess for this firma score purpose? Or should I use 0.001 or some other values? Thanks! Sabrina -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
[aroma.affymetrix] FIRMA SCORE from different test
Hi, everyone: I am trying to use aroma to detect splicing events. My dataset consists 3 groups (genetically different) with one as control, the other two as mutants. each group also had control and treatment subgroups. My interest is to compare mutant with control, under normal condition,and compare control under two condition (normal and treatment), and interactions among mutant and control with condition. I did two different runs for the second comparison (control group under two conditions): 1. using all groups , all conditions to do normalization, plmFit, firmaScore and use limFit with specific contrast matrix to find the splicing events. 2. only use control group, under two conditions, do normalization, plmFit, firmaScore, and limFit to find the splicing events. I compared the results from these two runs, they were quite different and B values from the topTable were very different. I wonder what is the right choice to fit my objective. Thanks Sabrina -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
[aroma.affymetrix] Re: EXON question
Hi, Mark: On Nov 17, 11:46 pm, Mark Robinson wrote: > Hi Sabrina. > > Comments below. > > > Hi, Mark: > > Thanks for the information. What I worried about using coordinate is > > that coordinate changes with assembly, while sequences do not change. > > I don't know how many psrs are out of exon boundary, I will look into > > it. But here is an example: > > group Number: 5092691: > > CTTATCGAGATAGTGCTTCTGTGGGTCAATCTAGATATTGATAGATTTGGACTGGAGAAG > > > when I used blat from ensembl, it showed that it is out of exon > > boundary. > > I guess I mean: do you have an example of a (core) probe that maps to > a location not within an bona fide exon? What's above is certainly > not an Affymetrix 25-mer probe, so I'm a bit confused. This is the sequence of the psr or from the unitGroup. The file I used was downloaded from Affy website. According to the website: Probe set sequences consist of the contiguous genomic sequence starting at the beginning of the first probe and ending at the end of the last probe in the set as they are aligned to the genome. They are provided in the orientation they exist in the mRNA in 5'->3' direction. so my hypothesis is that if all of the probes in psr are in the exon region, then this psr sequence should be in it as well, right? > > > Forgot to ask another question. Back to my original question, is there > > an (easy) way to map from partial sequence (i.e. the probeset > > sequence) to exon sequence in a batch mode or in R? Thanks! > > This is not really an aroma.affymetrix question and there are various > answers to this on the Bioconductor mailing list. Personally, I > usually use the findOverlaps() function in IRanges package. Its super > quick. Here is an rough sketch of an example (please don't take this > and assume it work work for you ... this is just to direct you toward > a way to do it): I will look into it! Thanks a lot :) Sabrina > > library(IRanges) > > # assume you have 3 corresponding vectors for the exons: > # 'exonStart', 'exonEnd', 'exonChr' > exonIranges <- mapply(IRanges, start = split(exonStart, exonChr), > end = split(exonEnd, exonChr)) > exonL <- do.call(RangesList, exonIranges) > > # similarly, corresponding vectors for the 25-mer Affy probes > # 'sp' is position, 'ch' is chromosome > probeRanges <- mapply(IRanges, start = split(sp, ch), > end = split(sp+24, ch)) > probeL <- do.call(RangesList, probeRanges) > > # you may wish to make sure that these lists cover the same chromosomes > fo <- findOverlaps(probeL, exonL) > > I'll leave it as an exercise to the reader to unwrap the contents of > the 'fo' object [Hint: see the as.table() method]. In your case, > you'd be interested to know how many probes do not "overlap" within > exon boundaries and I'd guess that you'll be careful what set of > probes you use (e.g. core only?). > > Hope that helps. > > Cheers, > Mark > > > Of course, if I made a mistake when I compiled the CDF, then that is > > different story. I hope not. > > > Any suggestions? Thanks! > > > On Nov 16, 8:10 pm, Mark Robinson wrote: > >> Hi folks. > > >> Note that you can download these directly with the R/Bioconductor > >> package 'rtracklayer'. For example: > > >> library(rtracklayer) > >> session <- browserSession("UCSC") > >> q1 <- ucscTableQuery(session, "refGene", GenomicRanges(genome = > >> "hg18")) > >> refGene <- getTable(q1) > > >> Sabrina: I'm actually surprised that many probes lie outside exon > >> boundaries. They were specifically designed to be inside. Of > >> course, > >> the array was designed on annotation from a few years ago, but > >> still I > >> would expect this to be minimal. Can you give some numbers on this? > >> Or, some examples. > > >> Cheers, > >> Mark > > >> On 17-Nov-09, at 3:41 AM, camelbbs wrote: > > >>>http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/ > >>> refFlat.txt.gz and refGene.txt.gz are the similar. > > >>> On Nov 15, 5:02 pm, sabrina wrote: > >>>> Hi, Jiang: > >>>> Can you give me more detail about UCSC RefGene file? Is there a > >>>> link > >>>> to download? Thanks! > > >>>> Sabrina > > >>>> On Nov 15, 11:38 am, camelbbs wrote: > >
[aroma.affymetrix] Re: EXON question
Hi, Mark: Forgot to ask another question. Back to my original question, is there an (easy) way to map from partial sequence (i.e. the probeset sequence) to exon sequence in a batch mode or in R? Thanks! Sabrina On Nov 16, 8:10 pm, Mark Robinson wrote: > Hi folks. > > Note that you can download these directly with the R/Bioconductor > package 'rtracklayer'. For example: > > library(rtracklayer) > session <- browserSession("UCSC") > q1 <- ucscTableQuery(session, "refGene", GenomicRanges(genome = "hg18")) > refGene <- getTable(q1) > > Sabrina: I'm actually surprised that many probes lie outside exon > boundaries. They were specifically designed to be inside. Of course, > the array was designed on annotation from a few years ago, but still I > would expect this to be minimal. Can you give some numbers on this? > Or, some examples. > > Cheers, > Mark > > On 17-Nov-09, at 3:41 AM, camelbbs wrote: > > > > >http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/ > > refFlat.txt.gz and refGene.txt.gz are the similar. > > > On Nov 15, 5:02 pm, sabrina wrote: > >> Hi, Jiang: > >> Can you give me more detail about UCSC RefGene file? Is there a link > >> to download? Thanks! > > >> Sabrina > > >> On Nov 15, 11:38 am, camelbbs wrote: > > >>> hi, > >>> I think you can get the probeset coordinates from affy exon array > >>> annotation files, and you can get exon coordinates from the UCSC > >>> RefGene files. Then you can compare them. > >>> Best, > >>> Jiang > > >>> On Nov 15, 6:56 am, sabrina wrote: > > >>>> Hello, all: > >>>> I used FIRMA to find potential spliced genes. But as we know, the > >>>> probesets from affy exon array could be out of exon boundary or > >>>> just > >>>> cover part of exons. I wonder if I have the sequence of the > >>>> probeset > >>>> (which I get from Affy website), how do I do it in batch to find > >>>> whether it is in an exon (ENSEMBL) region , or if it is, how do I > >>>> get > >>>> the entire exon sequence and coordinates? Thanks a lot! > > >>>> Sabrina > > > -- > > When reporting problems on aroma.affymetrix, make sure 1) to run the > > latest version of the package, 2) to report the output of > > sessionInfo() and traceback(), and 3) to post a complete code example. > > > You received this message because you are subscribed to the Google > > Groups "aroma.affymetrix" group. > > To post to this group, send email to aroma-affymetrix@googlegroups.com > > To unsubscribe from this group, send email to > > aroma-affymetrix-unsubscr...@googlegroups.com > > For more options, visit this group > > athttp://groups.google.com/group/aroma-affymetrix?hl=en > > -- > Mark Robinson, PhD (Melb) > Epigenetics Laboratory, Garvan > Bioinformatics Division, WEHI > e: m.robin...@garvan.org.au > e: mrobin...@wehi.edu.au > p: +61 (0)3 9345 2628 > f: +61 (0)3 9347 0852 > -- -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
[aroma.affymetrix] Re: EXON question
Hi, Mark: Thanks for the information. What I worried about using coordinate is that coordinate changes with assembly, while sequences do not change. I don't know how many psrs are out of exon boundary, I will look into it. But here is an example: group Number: 5092691: CTTATCGAGATAGTGCTTCTGTGGGTCAATCTAGATATTGATAGATTTGGACTGGAGAAG when I used blat from ensembl, it showed that it is out of exon boundary. Of course, if I made a mistake when I compiled the CDF, then that is different story. I hope not. Any suggestions? Thanks! On Nov 16, 8:10 pm, Mark Robinson wrote: > Hi folks. > > Note that you can download these directly with the R/Bioconductor > package 'rtracklayer'. For example: > > library(rtracklayer) > session <- browserSession("UCSC") > q1 <- ucscTableQuery(session, "refGene", GenomicRanges(genome = "hg18")) > refGene <- getTable(q1) > > Sabrina: I'm actually surprised that many probes lie outside exon > boundaries. They were specifically designed to be inside. Of course, > the array was designed on annotation from a few years ago, but still I > would expect this to be minimal. Can you give some numbers on this? > Or, some examples. > > Cheers, > Mark > > On 17-Nov-09, at 3:41 AM, camelbbs wrote: > > > > >http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/ > > refFlat.txt.gz and refGene.txt.gz are the similar. > > > On Nov 15, 5:02 pm, sabrina wrote: > >> Hi, Jiang: > >> Can you give me more detail about UCSC RefGene file? Is there a link > >> to download? Thanks! > > >> Sabrina > > >> On Nov 15, 11:38 am, camelbbs wrote: > > >>> hi, > >>> I think you can get the probeset coordinates from affy exon array > >>> annotation files, and you can get exon coordinates from the UCSC > >>> RefGene files. Then you can compare them. > >>> Best, > >>> Jiang > > >>> On Nov 15, 6:56 am, sabrina wrote: > > >>>> Hello, all: > >>>> I used FIRMA to find potential spliced genes. But as we know, the > >>>> probesets from affy exon array could be out of exon boundary or > >>>> just > >>>> cover part of exons. I wonder if I have the sequence of the > >>>> probeset > >>>> (which I get from Affy website), how do I do it in batch to find > >>>> whether it is in an exon (ENSEMBL) region , or if it is, how do I > >>>> get > >>>> the entire exon sequence and coordinates? Thanks a lot! > > >>>> Sabrina > > > -- > > When reporting problems on aroma.affymetrix, make sure 1) to run the > > latest version of the package, 2) to report the output of > > sessionInfo() and traceback(), and 3) to post a complete code example. > > > You received this message because you are subscribed to the Google > > Groups "aroma.affymetrix" group. > > To post to this group, send email to aroma-affymetrix@googlegroups.com > > To unsubscribe from this group, send email to > > aroma-affymetrix-unsubscr...@googlegroups.com > > For more options, visit this group > > athttp://groups.google.com/group/aroma-affymetrix?hl=en > > -- > Mark Robinson, PhD (Melb) > Epigenetics Laboratory, Garvan > Bioinformatics Division, WEHI > e: m.robin...@garvan.org.au > e: mrobin...@wehi.edu.au > p: +61 (0)3 9345 2628 > f: +61 (0)3 9347 0852 > -- -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
[aroma.affymetrix] Re: EXON question
Hi, Jiang: Can you give me more detail about UCSC RefGene file? Is there a link to download? Thanks! Sabrina On Nov 15, 11:38 am, camelbbs wrote: > hi, > I think you can get the probeset coordinates from affy exon array > annotation files, and you can get exon coordinates from the UCSC > RefGene files. Then you can compare them. > Best, > Jiang > > On Nov 15, 6:56 am, sabrina wrote: > > > Hello, all: > > I used FIRMA to find potential spliced genes. But as we know, the > > probesets from affy exon array could be out of exon boundary or just > > cover part of exons. I wonder if I have the sequence of the probeset > > (which I get from Affy website), how do I do it in batch to find > > whether it is in an exon (ENSEMBL) region , or if it is, how do I get > > the entire exon sequence and coordinates? Thanks a lot! > > > Sabrina -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
[aroma.affymetrix] EXON question
Hello, all: I used FIRMA to find potential spliced genes. But as we know, the probesets from affy exon array could be out of exon boundary or just cover part of exons. I wonder if I have the sequence of the probeset (which I get from Affy website), how do I do it in batch to find whether it is in an exon (ENSEMBL) region , or if it is, how do I get the entire exon sequence and coordinates? Thanks a lot! Sabrina -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en
[aroma.affymetrix] Re: exon array probe level value
Hi, Mark: On Sep 6, 3:58 am, Mark Robinson wrote: > Hi Sabrina. > > Please keep responses on-list in case others have a similar question. I think I just accidentally hit the reply to author, sorry about that! I solved the CDF problem because of one typo :) > > > The other question i have is that even if I use the existing CDF file, > > some of the gene expression or probe level expressions are really low, > > like 1.1. It seemed to me t hat they are just too low. These were > > obtained after the background normalization etc. > > I used Rmabackgroundcorrection > > I need a little more convincing before I get too worried about these > things. Is there any reason to suspect there is something wrong > here? What does the raw (i.e. before BG adjustment) data look like > for these units that have low expression? What does the data for this > unit look like after BG adjustment but before normalization? What > does the data for this unit look like before PLM fitting but after > normalization? Anything unusual in the commands you have run? Like > taking logs twice or BG adjusting data that has already been > preprocessed, etc.? I am still struggling about this issue. What I am trying to figure out is how to filter out background noise. I know that in the AROMA process, I do background correction, then normalization, then firma model at probeset level. After I finished normalization, (ExonRmaPlm), I took a look at the gene level and probe level distribution, here is an example: summary(pbsetSumm) #this is for each probe on the array #output: Min. : 0.8715 Min. : 0.9028 Min. : 0.9475 Min. : 0.9499 1st Qu.: 4.6625 1st Qu.: 4.5951 1st Qu.: 4.6517 1st Qu.: 4.6143 Median : 6.3976 Median : 6.3863 Median : 6.3850 Median : 6.3803 Mean : 6.3778 Mean : 6.3743 Mean : 6.3652 Mean : 6.3766 3rd Qu.: 7.9876 3rd Qu.: 8.0125 3rd Qu.: 7.9833 3rd Qu.: 8.0051 Max. :14.4451 Max. :14.3972 Max. :14.4558 Max. : 14.4009 In the original FIRMA paper, I noticed it filtered out all probes has expression level <3 (log2 scale) and in the simulation study, you used the two different values of Muc (7 and 10) to mimic two different scenarios, one close to background, one far above background. If I filter out these expression level <3 (only for the first array), I will filter out 16760 probes , which is about 8% of probes, but then the question is: if I want to detect exon skipping, the transcript with skipped exons should have background level expression for that exon , am I correct? If I am correct, then if I filter out these so- called background noise probes, will I miss these exon skipping events? i am a little bit confused and any suggestions are welcome! Thanks! Sabrina > Just some thoughts. > > Cheers, > Mark > > On 4-Sep-09, at 12:49 AM, sabrina wrote: > > > > > Hi, Mark: > > How do you test if the one I created is similar to the existing core > > CDF? I don't think I include the extended or FUll probes, because I > > only included the probes with notation of core from Affy's annotation > > file > > Here is the display when I used the existing core CDF > > > Path: annotationData/chipTypes/MoEx-1_0-st-v1 > > Filename: MoEx-1_0-st-v1,coreR1,A20080718,MR.cdf > > Filesize: 30.53MB > > Chip type: MoEx-1_0-st-v1,coreR1,A20080718,MR > > RAM: 0.00MB > > File format: v4 (binary; XDA) > > Dimension: 2560x2560 > > Number of cells: 6553600 > > Number of units: 17831 > > Cells per unit: 367.54 > > Number of QC units: 1 > > > when I check one of the units > > $`6838637` > > $`6838637`$type > > [1] 1 > > > $`6838637`$direction > > [1] 1 > > > from the existing CDF: > > $`6838637`$groups$`4736172` > > $`6838637`$groups$`4736172`$x > > [1] 2273 1148 816 1391 > > > $`6838637`$groups$`4736172`$y > > [1] 832 1172 2502 540 > > > $`6838637`$groups$`4736172`$pbase > > [1] "C" "T" "A" "G" > > > $`6838637`$groups$`4736172`$tbase > > [1] "G" "A" "T" "C" > > > $`6838637`$groups$`4736172`$expos > > [1] 0 1 2 3 > > > $`6838637`$groups$`4736172`$direction > > [1] 1 > > > --- > > Here is the one I created myself > > Path: annotationData/chipTypes/MoEx-1_0-st-v1 > > Filename: MoEx-1_0-st-v1,core,2009-09-02,no1Probe,no1Exon,SS.cdf > > Filesize: 28.14MB > > Chip type: MoEx-1_0-st-v1,core,2009-09-02,no1Probe,no1Exon,SS > > RAM: 0.00MB > > File format: v4 (binary; XDA) > > Dimension: 2560x2560 > > Nu
[aroma.affymetrix] Re: smoothScatter plot
Hi, Henrik: It worked! Thanks Sabrina > library(aroma.affymetrix) Loading required package: aroma.core Loading required package: R.cache R.cache v0.1.7 (2008-02-27) successfully loaded. See ?R.cache for help. Error: package 'R.filesets' 0.5.2 was found, but >= 0.5.3 is required by 'aroma.core' On Sep 5, 9:21 pm, Henrik Bengtsson wrote: > Hi, > > this occurs with geneplotter v1.21.5 or newer. This is because we > called geneplotter::smoothScatter(), but that function was moved from > geneplotter v1.21.4 to the graphics package of R v2.9.0. When you say > it used to work, this was probably when you had R v2.8.x. > > I've provided a fix that remains backward compatibility. This will be > available for everyone in the next major release. For now, you can > try: > > source("http://www.braju.com/R/hbLite.R";) > > # Unix-like systems > installPackages("http://www.braju.com/R/repos/aroma.core_1.1.6.tar.gz";); > installPackages("http://www.braju.com/R/repos/aroma.affymetrix_1.1.4.tar.gz";); > > # Windows systems > installPackages("http://www.braju.com/R/repos/aroma.core_1.1.6.zip";); > installPackages("http://www.braju.com/R/repos/aroma.affymetrix_1.1.4.zip";); > > and restart R. > > Does it work? > > /Henrik > > On Fri, Sep 4, 2009 at 7:29 AM, sabrina wrote: > > > Hi, all: > > I was trying the example of testing quality of cel files by using > > smoothScatter. > > I installed geneplotter but when I ran the code, it gave me the > > following error: > > > Error: 'smoothScatter' is not an exported object from > > 'namespace:geneplotter' > > > Can anyone help me about this error? > > > I ran the program before, it worked fine . I am using R2.9.1. Thanks > > > Sabrina --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] smoothScatter plot
Hi, all: I was trying the example of testing quality of cel files by using smoothScatter. I installed geneplotter but when I ran the code, it gave me the following error: Error: 'smoothScatter' is not an exported object from 'namespace:geneplotter' Can anyone help me about this error? I ran the program before, it worked fine . I am using R2.9.1. Thanks Sabrina --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: exon array probe level value
Hi, Mark: The other question i have is that even if I use the existing CDF file, some of the gene expression or probe level expressions are really low, like 1.1. It seemed to me t hat they are just too low. These were obtained after the background normalization etc. Is there any way that I can tell whether they are just background noise? I used Rmabackgroundcorrection Thanks Sabrina On Sep 2, 6:17 pm, Mark Robinson wrote: > Hi Sabrina. > > Hmm, that does seem unexpected. I would expect that if there are just > a few probes picked out, it shouldn't change that much. > > Can you give more details about this new CDF you made? Would it be > similar to the existing 'core' CDF? Presumably, there will be a large > overlap with the existing CDF. Can you verify this? And, give some > numbers on this. A few things like: total number of probes for each > CDF, for a given unit does it have a large overlap of probes, etc. > > The one thing that comes to mind is that maybe you've included a bunch > of the 'full' or 'extended' probes in your new CDF and this would > likely lower the average. > > Cheers, > Mark > > On 3-Sep-09, at 5:07 AM, sabrina shao wrote: > > > > > > > Hi, all > > I am working on mouse Exon arrays from Affy. Because there are probes > > with multiple hits on the genome, I screened out these probes and > > exons with only one probe and transcript clusters with one exon on the > > array. After that I created a CDF file as illustrated from this group. > > My problems is: when I checked the probe level expressions (not gene > > expression) , the average is about 4.5 , and the gene expression level > > ( not surprisingly) is about similar level. And the smallest probe > > level was about 1.1. I also used the CDF file I downloaded from aroma > > here, using the same program, I got the gene expression level around > > 6.2. That is about 1.7 fold change. and the results from the new CDF > > that I created was just too low. I can't figure it out why. Can anyone > > give me some suggestions and hints? Thanks > > > Sabrina > > -- > Mark Robinson, PhD (Melb) > Epigenetics Laboratory, Garvan > Bioinformatics Division, WEHI > e: m.robin...@garvan.org.au > e: mrobin...@wehi.edu.au > p: +61 (0)3 9345 2628 > f: +61 (0)3 9347 0852 > -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] exon array probe level value
Hi, all I am working on mouse Exon arrays from Affy. Because there are probes with multiple hits on the genome, I screened out these probes and exons with only one probe and transcript clusters with one exon on the array. After that I created a CDF file as illustrated from this group. My problems is: when I checked the probe level expressions (not gene expression) , the average is about 4.5 , and the gene expression level ( not surprisingly) is about similar level. And the smallest probe level was about 1.1. I also used the CDF file I downloaded from aroma here, using the same program, I got the gene expression level around 6.2. That is about 1.7 fold change. and the results from the new CDF that I created was just too low. I can't figure it out why. Can anyone give me some suggestions and hints? Thanks Sabrina --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: SNPs affecting EXon splicing detection
Hi, Mark I tried to run flat2Cdf, it ran towards end but told me that the memory was not enough, I am running windows xp with 4G ram. But it did read file, split file into units etc. So I tried on Linux server, but it gave me read.table error (same code I used on my PC) >>>>>> Reading TXT file ...Error in read.table(file, header = TRUE, colClasses = col.class, stringsAsFactors = FALSE, : unused argument(s) (stringsAsFactors ...) Execution halted >>>>>>> Can you suggest possible solutions? Thanks Sabrina On Jun 11, 9:41 am, Mark Robinson wrote: > Hi Sabrina. > > The Unit_ID can be any "transcript cluster" identifier of your > choice. The easiest may be to use the Affymetrix transcript cluster > identifier itself ... available from: > > http://www.affymetrix.com/analysis/downloads/current_exon/MoEx-1_0-st... > > See the 'transcript_cluster_id' column. Perhaps only take the "core" > probes, as defined in the the 'level' column? > > Note: we used Ensembl in that flat2Cdf() example since we were using a > custom organization (i.e. non-Affy) of the probesets. > > Cheers, > Mark > > On 11/06/2009, at 10:58 PM, sabrina wrote: > > > > > > > Hi, Mark: > > for the Unit_id, does it have to be Ensembl gene ID like ENSMUSG? > > Lots of genes do not have ensembl assignment from Affy annotation > > file. There are lots of missing annotaions, and I still have not found > > any good way to deal with it. Do you have any suggestions? > > > Thanks > > > Sabrina > > > On Jun 10, 12:32 am, Mark Robinson wrote: > >> Hi Sabrina. > > >> How about you try and create a 'flat' file like the one described > >> at:http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-file > >> ... > > >> Presumably, you will be comfortable with the Exon Array's 'probetab' > >> file by now and possibly the Affymetrix annotation CSV file and so > >> you > >> should have access to all this information. > > >> For example, from the following table: > > >> mac1618:HuEx-1_0-st-v2.probe.tab mrobinson$ head HuEx-1_0-st- > >> v2.probe.tab > >> Probe ID Probe Set ID probe x probe y assembly > >> seqname start stop > >> strand probe sequence target strandedness category > >> 494998 2315101 917 193 build-34/hg16 chr1 1788 > >> 1812 + > >> CACGGGAAGTCTGGGCTAAGAGACA Sense main > >> 1734213 2315101 1092 677 build-34/hg16 chr1 1973 > >> 1997 + > >> ACACCAGAAGATGAACAATGG Sense main > >> 4767517 2315101 796 1862 build-34/hg16 chr1 1992 > >> 2016 + > >> ATTAAGTTACATGCAGACAACAGGG Sense main > >> 4286427 2315101 986 1674 build-34/hg16 chr1 2006 > >> 2030 + > >> TGCCTGGTTGTGGTATTAAGTTACA Sense main > >> 5760145 2315102 144 2250 build-34/hg16 chr1 2520 > >> 2544 + > >> TCGGCCGTCGTCTTCTGCAGCTCTG Sense main > >> 671410 2315102 689 262 build-34/hg16 chr1 2523 > >> 2547 + > >> AAGTCGGCCGTCGTCTTCTGCAGCT Sense main > >> 4275780 2315102 579 1670 build-34/hg16 chr1 2526 > >> 2550 + > >> TCCAAGTCGGCCGTCGTCTTCTGCA Sense main > >> 4293462 2315102 341 1677 build-34/hg16 chr1 2531 > >> 2555 + > >> TGTGATCCAAGTCGGCCGTCGTCTT Sense main > >> 5388 2315103 267 2 build-34/hg16 chr1 2927 > >> 2951 + > >> CTGTCTGTCGACCCAGCTGGAGGCA Sense main > >> [snip] > > >> ... you see the second column is the probeset_id, which would be used > >> as the "Group_ID" column for your flat file. Depending on whether > >> you > >> are using the Ensembl CDF or the Affymetrix annotation, you would > >> need > >> to create a mapping to get the transcript cluster id column (here, > >> the > >> "Unit_ID"). Everything else you need (Probe_Sequence, X, Y, > >> Probe_ID) > >> is within the table above. > > >> Then, it would be just a matter of filtering OUT those probes that > >> overlap a SNP, which based on your mapping exercise, you must have a > >> list of. Then, make a call to the flat2Cdf() script and hopefully > >> you'll be off and running. > > >> Let me know how you go. > > >> Cheers, > >&
[aroma.affymetrix] Re: SNPs affecting EXon splicing detection
Hi, Mark: for the Unit_id, does it have to be Ensembl gene ID like ENSMUSG? Lots of genes do not have ensembl assignment from Affy annotation file. There are lots of missing annotaions, and I still have not found any good way to deal with it. Do you have any suggestions? Thanks Sabrina On Jun 10, 12:32 am, Mark Robinson wrote: > Hi Sabrina. > > How about you try and create a 'flat' file like the one described > at:http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-file... > > Presumably, you will be comfortable with the Exon Array's 'probetab' > file by now and possibly the Affymetrix annotation CSV file and so you > should have access to all this information. > > For example, from the following table: > > mac1618:HuEx-1_0-st-v2.probe.tab mrobinson$ head HuEx-1_0-st- > v2.probe.tab > Probe ID Probe Set ID probe x probe y assembly seqname start > stop > strand probe sequence target strandedness category > 494998 2315101 917 193 build-34/hg16 chr1 1788 1812 + > > CACGGGAAGTCTGGGCTAAGAGACA Sense main > 1734213 2315101 1092 677 build-34/hg16 chr1 1973 1997 + > > ACACCAGAAGATGAACAATGG Sense main > 4767517 2315101 796 1862 build-34/hg16 chr1 1992 2016 + > > ATTAAGTTACATGCAGACAACAGGG Sense main > 4286427 2315101 986 1674 build-34/hg16 chr1 2006 2030 + > > TGCCTGGTTGTGGTATTAAGTTACA Sense main > 5760145 2315102 144 2250 build-34/hg16 chr1 2520 2544 + > > TCGGCCGTCGTCTTCTGCAGCTCTG Sense main > 671410 2315102 689 262 build-34/hg16 chr1 2523 2547 + > > AAGTCGGCCGTCGTCTTCTGCAGCT Sense main > 4275780 2315102 579 1670 build-34/hg16 chr1 2526 2550 + > > TCCAAGTCGGCCGTCGTCTTCTGCA Sense main > 4293462 2315102 341 1677 build-34/hg16 chr1 2531 2555 + > > TGTGATCCAAGTCGGCCGTCGTCTT Sense main > 5388 2315103 267 2 build-34/hg16 chr1 2927 2951 + > > CTGTCTGTCGACCCAGCTGGAGGCA Sense main > [snip] > > ... you see the second column is the probeset_id, which would be used > as the "Group_ID" column for your flat file. Depending on whether you > are using the Ensembl CDF or the Affymetrix annotation, you would need > to create a mapping to get the transcript cluster id column (here, the > "Unit_ID"). Everything else you need (Probe_Sequence, X, Y, Probe_ID) > is within the table above. > > Then, it would be just a matter of filtering OUT those probes that > overlap a SNP, which based on your mapping exercise, you must have a > list of. Then, make a call to the flat2Cdf() script and hopefully > you'll be off and running. > > Let me know how you go. > > Cheers, > Mark > > On 10/06/2009, at 1:00 PM, sabrina wrote: > > > > > > > Thanks , Mark! > > Can you show me /walk me through how to get a new snp-free CDF ? I > > finally got the right version of snp and probe mapping so I am ready > > to try it out! > > > Sabrina > > > On Jun 6, 3:14 am, Mark Robinson wrote: > >> Hi Sabrina. > > >> Comments below. > > >> On 06/06/2009, at 1:57 AM, sabrina wrote: > > >>> Hi, Mark: > >>> I finally found the SNP data set that is suitable for my case. As I > >>> understand, aroma used RMA to estimate gene level and exon level > >>> intensities. After I estimate gene level (transcript level), I can > >>> use > >>> FIRMA to estimate residual for each exon and compose a score as > >>> described in the paper . My question is: if there is a SNP > >>> difference > >>> between two strains within one exon, should I exclude that exon from > >>> estimating transcript level value? My guess is probably no. > > >> If the SNP affects only 1 probe in an entire transcript, I would > >> expect it to have very little impact on the gene-level summary. And, > >> especially so if there are a large number of total probes for that > >> gene. It may have a noticeable effect on the probe effect. > > >>> So will it > >>> be a good idea if I exclude that exon after I calculate all FIRMA > >>> scores or should I exclude these exons after I estimate residuals , > >>> but only used these residuals not affected by SNPs for firma score > >>> estimation? Thanks > > >> Keep in mind the residuals are calculated at the pro
[aroma.affymetrix] Re: SNPs affecting EXon splicing detection
Thanks , Mark! Can you show me /walk me through how to get a new snp-free CDF ? I finally got the right version of snp and probe mapping so I am ready to try it out! Sabrina On Jun 6, 3:14 am, Mark Robinson wrote: > Hi Sabrina. > > Comments below. > > On 06/06/2009, at 1:57 AM, sabrina wrote: > > > > > Hi, Mark: > > I finally found the SNP data set that is suitable for my case. As I > > understand, aroma used RMA to estimate gene level and exon level > > intensities. After I estimate gene level (transcript level), I can use > > FIRMA to estimate residual for each exon and compose a score as > > described in the paper . My question is: if there is a SNP difference > > between two strains within one exon, should I exclude that exon from > > estimating transcript level value? My guess is probably no. > > If the SNP affects only 1 probe in an entire transcript, I would > expect it to have very little impact on the gene-level summary. And, > especially so if there are a large number of total probes for that > gene. It may have a noticeable effect on the probe effect. > > > So will it > > be a good idea if I exclude that exon after I calculate all FIRMA > > scores or should I exclude these exons after I estimate residuals , > > but only used these residuals not affected by SNPs for firma score > > estimation? Thanks > > Keep in mind the residuals are calculated at the probe-level, not the > probeset-level. The FIRMA score is then a summary of the all the > residuals for a probeset. > > I think you have (at least) 3 choices: > > 1. (preferred, i would think) you could remove all affected *probes* > (via the creation of a SNP-affected-probe-free CDF) in advance, then > run FIRMA as normal. I can help with this if you tell me which probes > are affected. > > 2. remove the affected *probesets* afterwards, since you may not > believe the FIRMA scores for which these are based on. > > 3. as you suggested, only calculate FIRMA scores from unaffected > residuals. But, the information you require to do this is the same > information required to do #1 and it would seems like #1 is preferred. > > The good thing about option #1 is you would still have some ability to > detect differential splicing for the probeset (instead of tossing it > away), albeit with the smaller number of remaining unaffected probes. > > Cheers, > Mark > > > > > Sabrina > > > On Apr 30, 3:46 am, Mark Robinson wrote: > >> Hi Sabrina. > > >> I have not had to deal with this myself, but I do know that it exists > >> and I can at least suggest a possible route to exclude affected > >> exons. > > >> Presumably, there is a database (dbSNP?) that tells you the genome > >> locations of each SNP for your strains. There is also a probe.tab > >> file from Affymetrix that gives you the mapped genome locations of > >> each probe (or you could take the sequences from the same file and > >> map > >> them yourself with a tool like BLAT). It is then just a matter of > >> looking whether each probe maps to a location on the genome that > >> overlaps a SNP. There is probably a Bioconductor tool for this or > >> you > >> could create a hash, etc. > > >> There are a couple levels at which you might introduce this to your > >> analysis. You could remove individual probes that are affected. On > >> the aroma.affymetrix side, this would require creating a new CDF with > >> those affected probes not included (a bit tricky but doable). Or, > >> you > >> could simply post-process your existing results and remove probesets > >> that have an affected probe (easier but not as elegant). > > >> You might've also seen: > > >> Duan S, Zhang W, Bleibel WK, Cox NJ, Dolan ME: SNPinProbe 1.0: A > >> database for filtering out > >> probes in the Affymetrix GeneChip(R) HumanExon1.0 ST array > >> potentially affected bySNPs. > >> Bioinformation 2008, 2(10):469{470. > > >> Hope that gets you started. > > >> Cheers, > >> Mark > > >> On 30/04/2009, at 6:07 AM, sabrina wrote: > > >>> Hi, all: > >>> I am using Aroma for detectingexonskipping events around two groups > >>> (two different strains). I found out that several of my top hits > >>> indeed includes at least one SNP between two strains. I wonder if > >>> anyone has some suggestion about how to deal with this situation. > >>> If I > >>> need to remove all affected exons from analys
[aroma.affymetrix] Re: SNPs affecting EXon splicing detection
Hi, Mark: I finally found the SNP data set that is suitable for my case. As I understand, aroma used RMA to estimate gene level and exon level intensities. After I estimate gene level (transcript level), I can use FIRMA to estimate residual for each exon and compose a score as described in the paper . My question is: if there is a SNP difference between two strains within one exon, should I exclude that exon from estimating transcript level value? My guess is probably no. So will it be a good idea if I exclude that exon after I calculate all FIRMA scores or should I exclude these exons after I estimate residuals , but only used these residuals not affected by SNPs for firma score estimation? Thanks Sabrina On Apr 30, 3:46 am, Mark Robinson wrote: > Hi Sabrina. > > I have not had to deal with this myself, but I do know that it exists > and I can at least suggest a possible route to exclude affected exons. > > Presumably, there is a database (dbSNP?) that tells you the genome > locations of each SNP for your strains. There is also a probe.tab > file from Affymetrix that gives you the mapped genome locations of > each probe (or you could take the sequences from the same file and map > them yourself with a tool like BLAT). It is then just a matter of > looking whether each probe maps to a location on the genome that > overlaps a SNP. There is probably a Bioconductor tool for this or you > could create a hash, etc. > > There are a couple levels at which you might introduce this to your > analysis. You could remove individual probes that are affected. On > the aroma.affymetrix side, this would require creating a new CDF with > those affected probes not included (a bit tricky but doable). Or, you > could simply post-process your existing results and remove probesets > that have an affected probe (easier but not as elegant). > > You might've also seen: > > Duan S, Zhang W, Bleibel WK, Cox NJ, Dolan ME: SNPinProbe 1.0: A > database for filtering out > probes in the Affymetrix GeneChip(R) HumanExon1.0 ST array > potentially affected bySNPs. > Bioinformation 2008, 2(10):469{470. > > Hope that gets you started. > > Cheers, > Mark > > On 30/04/2009, at 6:07 AM, sabrina wrote: > > > > > Hi, all: > > I am using Aroma for detectingexonskipping events around two groups > > (two different strains). I found out that several of my top hits > > indeed includes at least one SNP between two strains. I wonder if > > anyone has some suggestion about how to deal with this situation. If I > > need to remove all affected exons from analysis, how can I do it? I > > never worked with SNP data before, can anyone give me a hint? Thanks a > > lot! > > > Sabrina > > -- > Mark Robinson > Epigenetics Laboratory, Garvan > Bioinformatics Division, WEHI > e: m.robin...@garvan.org.au > e: mrobin...@wehi.edu.au > p: +61 (0)3 9345 2628 > f: +61 (0)3 9347 0852 > -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] SNPs affecting EXon splicing detection
Hi, all: I am using Aroma for detecting exon skipping events around two groups (two different strains). I found out that several of my top hits indeed includes at least one SNP between two strains. I wonder if anyone has some suggestion about how to deal with this situation. If I need to remove all affected exons from analysis, how can I do it? I never worked with SNP data before, can anyone give me a hint? Thanks a lot! Sabrina --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] view exon effect from intensity plot and FIRMA model
Hi, all: I am trying to understand the supplementary text of the original FIRMA paper published in Bioinformatics last year. the S1 figure shows UNR gene about its exon effect removed or retained. I am confused about the plots. what are the y-axis? Are they residuals from RMA fit , after running ExonRMAPlm? Let d be the normalized intensity (log2), r as the residual from plm fit (log2), p as the probe affinity. I would assume S1(a) is just simple plot of d-p, S1(b) as d-p+estimated e(j)? Am I correct? If that is the case, S1(a) should show exact same information as plot of d because for all arrays, the probe affinity should be the same? But from the figures, how can I tell where the splicing events occur? I am puzzled because I thought it should also remove gene effect (chip effect)? Can some one explain it to me? Thanks Sabrina --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: Error: cannot allocate vector of size 692.8 Mb
Hi, Mark: Quick question. I was looking at the example of creating cdfs from scratch. I downloaded biomaRt , but not sure where to get or how to create the exonBoundary file. Can you give me more information on that? Thanks! Sabrina On Jan 30, 1:13 am, Mark Robinson wrote: > Hi Sabrina. > > Comments below. > > On 30/01/2009, at 7:38 AM, sabrina wrote: > > > > > Hi, Mark: > > Thanks for the suggestions. > > The reason I was using FULL annotation is from what I read of that > > FIRMA paper. It mentioned that " if this is detection of novel exons, > > then all probesets might be used", so I was just giving it a try to > > see what happens. :) > > I also noticed that FirmaModel also has 3 different summary methods, > > "median" as default, "upperquartile" and "max". Due to curiosity, I > > used the other two methods, however I only got NAs for all > > trascript_clusters. Notice that in the paper, the authors suggested to > > use lower quartile or minimum, I wonder if there is any coding error. > > Of course, I have no way to check the code, just wondering... > > Its no error, it was intentional. You will notice (see > getFitUnitGroupFunction.FirmaModel) that in addition to > 'summaryMethod', there is also an 'operateOn' argument. When > operateOn="residuals" (default), you currently only have the choices > of 'mean' or 'median' for 'summaryMethod', which is why you get NAs. > We could add minimum and lowest absolute, but the paper states "... > Ultimately, we determined that the median of the residuals in an exon > gave the best tradeoff between sensitivity to the size and sign of the > residuals and robustness to the small number of probes ...". When > operateOn="weights", then you have summaryMethod must be in {median, > upperQuartile, max}. Again, other possibilities could be made > available. > > > > > On Jan 28, 10:12 pm, Mark Robinson wrote: > >> Hi Sabrina. > > >> First of all, you are somewhat in unchartered waters here. I > >> personally don't recommend using the 'full' CDF for FIRMA analysis. > >> Others can disagree (and I would encourage some discussion about > >> this ...), but my reasoning is that the majority of the probes in the > >> full CDF are querying poorly annotated or predicted exonic regions of > >> the genome. From what I've seen (on Human Exon 1.0 data), the probe > >> intensities for these are mostly at background and could have an > >> unsavoury effect on the PLM modelling ... and passing that on to > >> FIRMA. By restricting to core/Ensembl probes, you lessen the effect > >> of non-responding probes. > > >> From thinking about it (although I haven't actually done on my > >> datasets), I would suggest a two-stage analysis: > > >> 1. PLM/FIRMA analysis on a set of probes in well-annotated regions. > > > Do you mean to use the coreR1 annotation? I am not quite sure that I > > got it. Do you mean specific chromosome? If that is the case, how can > > I pass that to FirmaModel? I don't see any parameter that I can use :( > > Sorry, I'm mostly thinking out loud here. I don't have a vignette > written up for this workflow, just ideas in my head. Perhaps > Elizabeth has some comment about what to do with the novel exons. > > So, yes, by 'well-annotated', I mean to use the 'coreR1' annotation in > step 1. Or least a CDF that includes probes up to the level of > annotation that you are comfortable with. Bit of work required, but > you'll notice that we've created Ensembl-based annotation for the > Human Exon arrays. Thats just an alternative (more expansive) > definition of the 'core' gene models than what Affy gives. > > >> 2. Differential expression analysis on the remaining probes -- > >> strongly differentially expressed ones may be indicative of > >> transcripts/variants that are not covered in the well-annotated set. > > > Again, I am not sure what to define the remaining probes. Can you give > > me a hint? > > So what I understood is that I could do exon -AS using core > > annotation, then use Full annotation just do gene summary for the > > second stage analysis? I guess I am bit confused :) > > Off the top of my head, here are a few more details: > > 1. Do standard FIRMA analysis on 'coreR1' probesets. > > 2. Using the same normalized data, fit 'ExonRmaPlm' with > 'mergeGroups=FALSE' (
[aroma.affymetrix] Re: Error: cannot allocate vector of size 692.8 Mb
Hi, Mark: Thanks for the suggestions. The reason I was using FULL annotation is from what I read of that FIRMA paper. It mentioned that " if this is detection of novel exons, then all probesets might be used", so I was just giving it a try to see what happens. :) I also noticed that FirmaModel also has 3 different summary methods, "median" as default, "upperquartile" and "max". Due to curiosity, I used the other two methods, however I only got NAs for all trascript_clusters. Notice that in the paper, the authors suggested to use lower quartile or minimum, I wonder if there is any coding error. Of course, I have no way to check the code, just wondering... On Jan 28, 10:12 pm, Mark Robinson wrote: > Hi Sabrina. > > First of all, you are somewhat in unchartered waters here. I > personally don't recommend using the 'full' CDF for FIRMA analysis. > Others can disagree (and I would encourage some discussion about > this ...), but my reasoning is that the majority of the probes in the > full CDF are querying poorly annotated or predicted exonic regions of > the genome. From what I've seen (on Human Exon 1.0 data), the probe > intensities for these are mostly at background and could have an > unsavoury effect on the PLM modelling ... and passing that on to > FIRMA. By restricting to core/Ensembl probes, you lessen the effect > of non-responding probes. > > From thinking about it (although I haven't actually done on my > datasets), I would suggest a two-stage analysis: > > 1. PLM/FIRMA analysis on a set of probes in well-annotated regions. Do you mean to use the coreR1 annotation? I am not quite sure that I got it. Do you mean specific chromosome? If that is the case, how can I pass that to FirmaModel? I don't see any parameter that I can use :( > > 2. Differential expression analysis on the remaining probes -- > strongly differentially expressed ones may be indicative of > transcripts/variants that are not covered in the well-annotated set. > Again, I am not sure what to define the remaining probes. Can you give me a hint? So what I understood is that I could do exon -AS using core annotation, then use Full annotation just do gene summary for the second stage analysis? I guess I am bit confused :) > To answer your question below, the table of FIRMA values shouldn't > actually be all that large, so this should be possible. One thing to > check first of all is that you have a relatively clean R session. Do > you have tables/data frames/objects in memory that are consuming a > bunch of space? aroma.affymetrix is memory efficient, but it will > need some room to work. I just ran an example here on the FULL CDF > and it appears to tick along just fine, though I can see if the memory > spikes. > > As always when reporting errors, it doesn't hurt to give the results > of 'sessionInfo()', 'traceback()' ... and you could even set > 'verbose=-40' (say) in your call to 'extractDataFrame' to see where it > all goes wrong. Thanks for the suggestion! I will start to do that! Sabrina > > Hope that helps. > > Cheers, > Mark > > On 28/01/2009, at 3:45 AM, sabrina wrote: > > > > > > > Hi, all: > > I am using FIRMA model for my Mouse Exon array analysis. I am testing > > the FULL annotation. After I used FirmaModel and fit it, I used the > > following code to extract Firma Scores: > > exFirma<-extractDataFrame(firmaScore,addNames=TRUE); > > > however, I got the following error: > > > Error: cannot allocate vector of size 692.8 Mb > > In addition: Warning messages: > > 1: In getUnitGroupCellMap.FirmaFile(ce, ...) : > > Reached total allocation of 1535Mb: see help(memory.size) > > > I used memory.size(), it gave me: 903.4713 > > and memory.limit() is: 1535.875 > > > Does anyone have any suggestion to solve this problem? Thanks!!! > > > Sabrina > > -- > Mark Robinson > Epigenetics Laboratory, Garvan > Bioinformatics Division, WEHI > e: m.robin...@garvan.org.au > e: mrobin...@wehi.edu.au > p: +61 (0)3 9345 2628 > f: +61 (0)3 9347 0852 > -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Error: cannot allocate vector of size 692.8 Mb
Hi, all: I am using FIRMA model for my Mouse Exon array analysis. I am testing the FULL annotation. After I used FirmaModel and fit it, I used the following code to extract Firma Scores: exFirma<-extractDataFrame(firmaScore,addNames=TRUE); however, I got the following error: Error: cannot allocate vector of size 692.8 Mb In addition: Warning messages: 1: In getUnitGroupCellMap.FirmaFile(ce, ...) : Reached total allocation of 1535Mb: see help(memory.size) I used memory.size(), it gave me: 903.4713 and memory.limit() is: 1535.875 Does anyone have any suggestion to solve this problem? Thanks!!! Sabrina --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: exon array with technical replicates
Hi, Mark: I did options, it gave me the exact display as what you showed to me.:) Here is some of my confusion: I checked the exFirma scores of these NAs to see what transcript_cluster_id they were assigned to so that I could check whether they do have >5000 probes assigned to it. One of them is : 6824548. I went back to the file I saved for # of probes per exon to check how many probes it has. To my surprise, it only has one probeset and one probes: 6824548.4897698 which matches the probeset_id (4897698). Then I went ahead to download the affymetrix annotation file, transcript.csv and probeset.csv. In the transcript.csv file, 6824548 has 17 probes in total, and in the probeset.csv, it shows that transcript_cluster_id : 6824548 has 5 probesets, 4897698 is one of them with only one probe assigned to it. I tried the extenedR1 cdf version, it gave me the same thing (in terms of # of probes per exon, using nProbesPerExon<-readCdfNbrOfCellsPerUnitGroup(getPathname(cdf)) I am bit confused. It seemed to me that in the cdf file, some of the probesets for the gene were missing for this transcript_cluster_id. I wonder if you can help me out on this . I noticed that you created the cdf files so you are the right person to ask :) The code I generated the # of probes per exon is: nProbesPerExon<-readCdfNbrOfCellsPerUnitGroup(getPathname(cdf)) nProbesPerExonVector<-unlist(nProbesPerExon) Thanks and Have a great weekend Sabrina On Jan 22, 5:18 pm, Mark Robinson wrote: > Hi Sabrina. > > See below. > > On 23/01/2009, at 3:17 AM, sabrina wrote: > > > > > > > Hi, Mark: > > Thanks for the suggestions. I think I will go with one of the > > replicates for now, just to make things simple, later on I will deal > > with the replicates. > > Now, here is another problem I have. I used the following code to > > generate the firma score: > > > firma <-FirmaModel(plm); > > fit(firma,verbose=verbose) > > firmaScore<- getFirmaScores(firma); > > > exFirma<-extractDataFrame(firmaScore,addNames=TRUE); > > exFirma[,6:ncol(exFirma)]<-log2(exFirma[,6:ncol(exFirma)]) > > > then when I use limma fit, it gave me warnings: > > Warning message: > > In lmFit(exFirma[, 6:ncol(exFirma)], mm) : > > Some coefficients not estimable: coefficient interpretation may > > vary. > > > It turned out that there are several rows of exFirma (even before the > > log2) were NAs. But when I check the overall exon expressions (use > > ExonRmaPlm with para: mergeGroups=FALSE), for that specific exon (I > > take the group name as the exon id here), they have real values. I > > wonder where went wrong, perhaps fit(firma) step? > > I'll make a wager that this is due to a small number of probesets that > have a large (>500 say) probes assigned to them. You are using the > Mouse Exon array and I know there are some of these probesets in > there. In the interest of time, aroma.affymetrix changes over to > median polish (faster), or skips the probeset altogether, depending on > how large. > > You can see the default settings in options(). For me (and probably > you as well), it is: > > > options("aroma.affymetrix.settings") > [...] > > $aroma.affymetrix.settings$models > $aroma.affymetrix.settings$models$RmaPlm > $aroma.affymetrix.settings$models$RmaPlm$medianPolishThreshold > [1] 500 6 > > $aroma.affymetrix.settings$models$RmaPlm$skipThreshold > [1] 5000 1 > > So, I would just remove these rows before you use lmFit(). > > Cheers, > Mark > > > > > > > Thanks! > > > Sabrina > > > On Jan 20, 7:48 pm, Mark Robinson wrote: > >> Hi Sabrina. > > >>>> Do you have biological replicates of some samples and technical > >>>> replicates of others? Or, just technical replicates of everything? > > >>> My experiment has two groups, one has 5 samples (biological > >>> replicates, that is 5 mice from one strain), and the other has 4 > >>> samples. Among 5 samples of the first group, there are two samples > >>> hybridized twice to two arrays, so I have 4 arrays for that two > >>> samples > >>> ( That is what I meant as technical replicate). Does that make > >>> sense? > > >> OK, now I get it. Thanks. > > >>>> I suspect it would be difficult to justify adding a bunch of extra > >>>> (adhoc) steps into the FIRMA pipeline. I don't have a full > >>>> understanding of your experiment, but what about just dealing > >>>> with it > >>>> when you operate on the FIRMA scores? When you say "average on >
[aroma.affymetrix] Re: exon array with technical replicates
Hi, Mark: Thanks for the suggestions. I think I will go with one of the replicates for now, just to make things simple, later on I will deal with the replicates. Now, here is another problem I have. I used the following code to generate the firma score: firma <-FirmaModel(plm); fit(firma,verbose=verbose) firmaScore<- getFirmaScores(firma); exFirma<-extractDataFrame(firmaScore,addNames=TRUE); exFirma[,6:ncol(exFirma)]<-log2(exFirma[,6:ncol(exFirma)]) then when I use limma fit, it gave me warnings: Warning message: In lmFit(exFirma[, 6:ncol(exFirma)], mm) : Some coefficients not estimable: coefficient interpretation may vary. It turned out that there are several rows of exFirma (even before the log2) were NAs. But when I check the overall exon expressions (use ExonRmaPlm with para: mergeGroups=FALSE), for that specific exon (I take the group name as the exon id here), they have real values. I wonder where went wrong, perhaps fit(firma) step? Thanks! Sabrina On Jan 20, 7:48 pm, Mark Robinson wrote: > Hi Sabrina. > > >> Do you have biological replicates of some samples and technical > >> replicates of others? Or, just technical replicates of everything? > > > My experiment has two groups, one has 5 samples (biological > > replicates, that is 5 mice from one strain), and the other has 4 > > samples. Among 5 samples of the first group, there are two samples > > hybridized twice to two arrays, so I have 4 arrays for that two > > samples > > ( That is what I meant as technical replicate). Does that make sense? > > OK, now I get it. Thanks. > > > > >> I suspect it would be difficult to justify adding a bunch of extra > >> (adhoc) steps into the FIRMA pipeline. I don't have a full > >> understanding of your experiment, but what about just dealing with it > >> when you operate on the FIRMA scores? When you say "average on plm", > >> I assume this means an average of the chip effects for those two > >> samples? > > > Yes, that is what I meant. By averaging chipEffects, I actually > > average the gene signals from the two arrays. > > >> You could fit a PLM that estimates a single chip effect for > >> those two samples and use that for calculating FIRMA scores. > > > Do you mean to fit a plm for these two arrays for one sample > > separately from the other samples? If I just fit plm for these two > > arrays (say sample 1),will it estimate different probe affinity? If it > > does , then these chipEffects (gene signals to estimate FIRMA score) > > won't be compatible , am I correct? Thanks! > > I'm not suggesting to fit multiple PLMs for the same gene and somehow > combine them. What I'm suggesting is a single PLM, but where there is > only 1 chip effect parameter for those 2 samples. From a design > matrix point of view, this is conceptually straightforward. On the > top of my head though, I don't know how to get the 'preprocessCore' > code (this is used under the hood in aroma.affymetrix to do the > fitting) to fit such a model. It may have to be a one-off. > > Another alternative is to average the FIRMA scores (subsequent to the > standard PLM fitting) for these 2 samples and do a 4 versus 4 > comparison to look for changes in FIRMA scores. > > And yet another alternative is to ignore this altogether. Its > unlikely (maybe? feel free to disagree) that 1 technical replicate > amongst 8 biological replicates would cause you to underestimate the > variability so much as to significantly overstate the changes you > see ... but thats just a hunch. Presumably, you'd also be validating > the major discoveries that you make. > > Hope that helps. > > Mark --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: exon array with technical replicates
Hi, Mark: > Do you have biological replicates of some samples and technical > replicates of others? Or, just technical replicates of everything? My experiment has two groups, one has 5 samples (biological replicates, that is 5 mice from one strain), and the other has 4 samples. Among 5 samples of the first group, there are two samples hybridized twice to two arrays, so I have 4 arrays for that two samples ( That is what I meant as technical replicate). Does that make sense? > > I suspect it would be difficult to justify adding a bunch of extra > (adhoc) steps into the FIRMA pipeline. I don't have a full > understanding of your experiment, but what about just dealing with it > when you operate on the FIRMA scores? When you say "average on plm", > I assume this means an average of the chip effects for those two > samples? Yes, that is what I meant. By averaging chipEffects, I actually average the gene signals from the two arrays. >You could fit a PLM that estimates a single chip effect for > those two samples and use that for calculating FIRMA scores. Do you mean to fit a plm for these two arrays for one sample separately from the other samples? If I just fit plm for these two arrays (say sample 1),will it estimate different probe affinity? If it does , then these chipEffects (gene signals to estimate FIRMA score) won't be compatible , am I correct? Thanks! Sabrina --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] exon array with technical replicates
Hi, all: I am working on Affy Mouse Exon Array . Because of the experiment design and quality of the hybridization, we have two arrays hybridized from one mouse (same biological sample). I assume that I should treat these two arrays as tehcnical duplicates. If that is the case, I could do background correction, normalization and summary separately for these two arrays,(RMA, and ExonRmaPlm), then before I use FIRMA to get firma scores, can I just do average on plm of these two arrays? But then what do I feed in FIRMAModel? ( The default one is just plm results directly from ExonRmaPlm) .Or any suggestions about how to deal with this situation? My goal is to find novel splicing events, but right now I am just using core annotation to try it out. Thanks! Sabrina --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: FIRMA score
Hi, Mark! Thank you so much! After an intensive study of FIRMA and aroma.affymetrix, I think I sort of figure out a routine for my exon study. And now I am more into R programming instead of C or Matlab :) One side question I still have is the probe affinity effect. I never used it and just wonder what is it for? Maybe I still don't fully understand FIRMA model very well :( Thanks! Sabrina On Dec 10, 5:34 pm, Mark Robinson <[EMAIL PROTECTED]> wrote: > > I assume that you referred me to the probeset annotation csv file , > > not the transcript annotation csv file. :) > > > I did get the exon coordinates! it seemed to me that the probeset_id > > in the probeset annotation file is equivalent to groupName of > > exFirma. > > Yes! The transcript CSV file may be useful for other things, but > indeed not for exon coordinates. > > In aroma.affymetrix terminology for exon arrays, > unit=transcript_cluster_id, group=probeset_id and cell=probe. > > > One question I have is in the exon array plot from GenomeGraphs, the > > unrData is the probe level data, in other words, in my case, I should > > use the normalized data, csN, is that correct? > > If you look closely at the Purdom paper, there are a few things you > may want to plot in the context of Ensembl annotation. The normalized > data is the obvious one, which you have linked to an object 'csN'. > > In addition, you may want to plot the residuals: > > rs <- calculateResiduals(plmTr) # plmTr is the fitted ExonRmaPlm with > mergeGroups=TRUE > d <- extractMatrix(rs,cells=...) > > ... or for example you may want to plot the raw data, adjusted by the > probe effects. For example, RMA fits the model > > Y_{ij} = a_i + b_j > > (Y_{ij} = normalized data, a_i = chip effects, b_j = probe effects), > you may be interested in plotting: > > Y_{ij} - \hat{b}_j > > ... i.e. raw data, adjusted by the estimated probe effects. > > > another question is graph related though. Since I have two groups > > among my 9 samples, when I plot exon array, is there any way to use > > different color coding for these samples on the same graph? Thanks! > > Short answer: > > library(GenomeGraphs) > ?"ExonArray-class" > > Long answer: create your ExonArray object and use DisplayPars slot to > specify colour and line width: > > ea1<-new("ExonArray", intensity = d, probeStart = ..., probeEnd=..., > probeId = ..., nProbes = ..., dp = DisplayPars(color = > col, lwd=lwd, > mapColor = "dodgerblue2",plotMap=TRUE)) > > where 'col' and 'lwd' are either length 1 (in which case all lines get > the same width and colour) or vectors of the length of the number of > samples ... > > Thanks for all these questions Sabrina. This will help make a nice > vignette ... when I get time :) > > Unless you'd be interested in summarizing your approach! :) > > Cheers, > Mark > > > Sabrina > > > On Dec 9, 5:02 pm, Mark Robinson <[EMAIL PROTECTED]> wrote: > >> Hi Sabrina. > > >> Great work! See below. > > >> On 10/12/2008, at 5:27 AM, sabrina wrote: > > >>> Hi, Mark! > >>> Thank you so much! I think I pretty much figured out how to get the > >>> gene level, exon level expressions and comparison done. I checked > >>> the > >>> GenomeGraphs as you suggested. I tried the code in the exon array > >>> section, and it worked fine. So here is the question related to my > >>> data set. > > >>> In order to plot exon data, I used the following code to get exon > >>> summary (intensities) which is pbsetSummLog2: > > >>> plmNoMerge<- ExonRmaPlm(csN, mergeGroups=FALSE) > >>> fit(plmNoMerge) > >>> readUnits(plmNoMerge,units=1) > > >>> chpNoMerge<-getChipEffectSet(plmNoMerge) > > >>> pbsetSumm<-extractMatrix(chpNoMerge,returnUgcMap=TRUE) > >>> pbsetSummLog2<-log2(pbsetSumm); > >>> pbsetNames<-readCdfGroupNames(getPathname(getCdf(chpNoMerge)), > >>> unit=unique(attr(pbsetSumm,"unitGroupCellMap")[,"unit"])) > >>> rownames(pbsetSumm)<-unlist(pbsetNames) > >>> rownames(pbsetSummLog2)<-unlist(pbsetNames) > > >>> Because each significant exon that was detected by FIRMA is > >>> associated > >>> with one gene, and that gene has several exons, therefore, I used > >>> the > >>> following to find all exons associated to that gene: > >>> (x is the result fr
[aroma.affymetrix] Re: FIRMA score
Thanks! Mark: I assume that you referred me to the probeset annotation csv file , not the transcript annotation csv file. :) I did get the exon coordinates! it seemed to me that the probeset_id in the probeset annotation file is equivalent to groupName of exFirma. One question I have is in the exon array plot from GenomeGraphs, the unrData is the probe level data, in other words, in my case, I should use the normalized data, csN, is that correct? another question is graph related though. Since I have two groups among my 9 samples, when I plot exon array, is there any way to use different color coding for these samples on the same graph? Thanks! Sabrina On Dec 9, 5:02 pm, Mark Robinson <[EMAIL PROTECTED]> wrote: > Hi Sabrina. > > Great work! See below. > > On 10/12/2008, at 5:27 AM, sabrina wrote: > > > > > > > Hi, Mark! > > Thank you so much! I think I pretty much figured out how to get the > > gene level, exon level expressions and comparison done. I checked the > > GenomeGraphs as you suggested. I tried the code in the exon array > > section, and it worked fine. So here is the question related to my > > data set. > > > In order to plot exon data, I used the following code to get exon > > summary (intensities) which is pbsetSummLog2: > > > plmNoMerge<- ExonRmaPlm(csN, mergeGroups=FALSE) > > fit(plmNoMerge) > > readUnits(plmNoMerge,units=1) > > > chpNoMerge<-getChipEffectSet(plmNoMerge) > > > pbsetSumm<-extractMatrix(chpNoMerge,returnUgcMap=TRUE) > > pbsetSummLog2<-log2(pbsetSumm); > > pbsetNames<-readCdfGroupNames(getPathname(getCdf(chpNoMerge)), > > unit=unique(attr(pbsetSumm,"unitGroupCellMap")[,"unit"])) > > rownames(pbsetSumm)<-unlist(pbsetNames) > > rownames(pbsetSummLog2)<-unlist(pbsetNames) > > > Because each significant exon that was detected by FIRMA is associated > > with one gene, and that gene has several exons, therefore, I used the > > following to find all exons associated to that gene: > > (x is the result from topTable) > > > temp<-grep(exFirma[x$ID,1][1],exFirma[,1]); > > gp_temp<-exFirma[temp,2] > > > gene1<-pbsetSummLog2[temp,] > > > which gives me: > > array1 array 2 ... > > 4308385 6.338784 6.896304 > > 4376965 1.973171 2.272406 > > > I know that "430835" is the groupName (aka exon id), but I am not sure > > where to find the start and end position of these individual exons, > > can you show me how to do it? > > > The reason I asked this is because in makeExonArray, I need the > > probeStart and probeEnd positions. Thanks!!! > > Well, you can get the probeset start and end positions from the > 'NetAffx Annotation Files' from Affymetrix. For human, this would be > at: > > http://www.affymetrix.com/products_services/arrays/specific/exon.affx... > > I think you said you were using mouse, so you can find this at: > > http://www.affymetrix.com/products_services/arrays/specific/mouse_exo... > > Find the 'Current NetAffx Annotation Files' section and download the > csv.zip file. Just to have a quick peak at a few columns of this > file, if I run a unix tool called awk on the CSV file you get: > > > awk '{FS=","; print $7,$1,$2,$3,$4,$5,$16}' HuEx-1_0-st- > v2.na25.hg18.probeset.csv | grep -v "^ " | more > > > "transcript_cluster_id" "probeset_id" "seqname" "strand" "start" > "stop" "level" > ... > "2315373" "2315374" "chr1" "+" "742655" "742719" "core" > "2315373" "2315375" "chr1" "+" "742869" "743231" "core" > "2315373" "2315376" "chr1" "+" "743293" "743434" "core" > "2315373" "2315377" "chr1" "+" "744094" "744979" "core" > "2315380" "2315381" "chr1" "+" "752897" "752974" "extended" > "2315380" "2315382" "chr1" "+" "754246" "754274" "extended" > ... > > so there you have everything you need -- identifiers and chromosome > locations. You can just use 'read.csv' to read it into R and pull off > the coordinates you need. > > Cheers, > Mark > > > > > Sabrina > > On Dec 8, 4:54 pm, Mark Robinson <[EMAIL PR
[aroma.affymetrix] Re: FIRMA score
Hi, Mark! Thank you so much! I think I pretty much figured out how to get the gene level, exon level expressions and comparison done. I checked the GenomeGraphs as you suggested. I tried the code in the exon array section, and it worked fine. So here is the question related to my data set. In order to plot exon data, I used the following code to get exon summary (intensities) which is pbsetSummLog2: plmNoMerge<- ExonRmaPlm(csN, mergeGroups=FALSE) fit(plmNoMerge) readUnits(plmNoMerge,units=1) chpNoMerge<-getChipEffectSet(plmNoMerge) pbsetSumm<-extractMatrix(chpNoMerge,returnUgcMap=TRUE) pbsetSummLog2<-log2(pbsetSumm); pbsetNames<-readCdfGroupNames(getPathname(getCdf(chpNoMerge)), unit=unique(attr(pbsetSumm,"unitGroupCellMap")[,"unit"])) rownames(pbsetSumm)<-unlist(pbsetNames) rownames(pbsetSummLog2)<-unlist(pbsetNames) Because each significant exon that was detected by FIRMA is associated with one gene, and that gene has several exons, therefore, I used the following to find all exons associated to that gene: (x is the result from topTable) temp<-grep(exFirma[x$ID,1][1],exFirma[,1]); gp_temp<-exFirma[temp,2] gene1<-pbsetSummLog2[temp,] which gives me: array1 array 2 ... 4308385 6.3387846.896304 4376965 1.9731712.272406 I know that "430835" is the groupName (aka exon id), but I am not sure where to find the start and end position of these individual exons, can you show me how to do it? The reason I asked this is because in makeExonArray, I need the probeStart and probeEnd positions. Thanks!!! Sabrina On Dec 8, 4:54 pm, Mark Robinson <[EMAIL PROTECTED]> wrote: > Hi Sabrina. > > On 08/12/2008, at 3:25 PM, sabrina wrote: > > > > > > > Hi, Mark: > > Thanks for the help! I think now I have better understanding of LIMMA > > now after I read the related chapter of LIMMA user's guide :) > > > On Dec 4, 4:10 pm, "Mark Robinson" <[EMAIL PROTECTED]> wrote: > >> Sabrina. > > >> When asking questions like this, please post your full set of > >> commands. > >> Its hard to decode what variables have been set without this. :) > > >> I'm guessing that affyAnnotation is what came from the 'read.csv' > >> call and > >> x2 is the output of topTable? Have you set the rownames of exFirma? > > > x2 is the output of topTable > > I do not know how to set rownames of exFirma, when I check exFirma > > [1:3,1:6], it gave me the following: > > > unitName groupName unit group cell Array1 > > 1 6838637 4304927 1 1 1 1.0372756 > > 2 6838637 4330595 1 2 2 -1.1679380 > > 3 6838637 4356771 1 3 3 1.2217411 > > > Can you show me how to set the rownames? where can I find the row > > names? Is it correct that groupName is the exon id ? > > A good place to start: > > ?rownames > > > > > I used the following code to import affymetrix annotation file: > > > affyAnnotation<- read.csv(file="MoEx-1_0-st- > > v1.na27.mm9.transcript.csv",head=TRUE,sep=",",skip=20); > > > and > > >> Is this just a typo then? It should be > >> affyAnnotation$transcript_cluster_id instead of > >> affyAnnotation$transcript_cluster_Id. > > >> What does exFirma[x2$ID,1] actually give you? If the > >> rownames(exFirma) > >> are not set, this could mislead you. > > > I used the suggestion you gave to me to redo the lmFit, and here is > > what I used : > > > mm <- model.matrix(~cls) > > fit <- lmFit(exFirma[,6:ncol(exFirma)], mm) > > fit<-eBayes(fit) > > x<-topTable(fit,coef=2,adj="fdr") > > > Then I used exFirma[x$ID,1], it gave me: > > [1] "6996667" "6965348" "6982181" "6917790" "6899766" "7005797" > > "6880577" "6962054" "6897349" > > [10] "7018774" > > > when I printed x, it gave me: > > > ID logFC t P.Value adj.P.Val B > > 11253 11253 -4.856783 -8.524568 2.223806e-06 0.4982258 -4.323317 > > 183160 183160 4.058556 7.676406 6.413325e-06 0.7184271 -4.332955 > > 32235 32235 -3.692429 -6.555475 2.963704e-05 1.000 -4.350176 > > 132610 132610 2.315536 6.508698 3.170265e-05 1.000 -4.351036 > > 165279 165279 3.269404 6.311540 4.225007e-05 1.000 -4.354809 > > 171777 171777 3.845620 6.174738 5.172731e-05 1.000 -4.357574 > > 45380 45380 1.971388 6.088251 5.886544e-05 1.000 -4.359388 > > 132873 132873 -4.569042 -6.021868 6.505114e-0
[aroma.affymetrix] Re: FIRMA score
Hi, Mark: Thanks for the help! I think now I have better understanding of LIMMA now after I read the related chapter of LIMMA user's guide :) On Dec 4, 4:10 pm, "Mark Robinson" <[EMAIL PROTECTED]> wrote: > Sabrina. > > When asking questions like this, please post your full set of commands. > Its hard to decode what variables have been set without this. :) > > I'm guessing that affyAnnotation is what came from the 'read.csv' call and > x2 is the output of topTable? Have you set the rownames of exFirma? x2 is the output of topTable I do not know how to set rownames of exFirma, when I check exFirma [1:3,1:6], it gave me the following: unitName groupName unit group cellArray1 1 6838637 43049271 11 1.0372756 2 6838637 43305951 22 -1.1679380 3 6838637 43567711 33 1.2217411 Can you show me how to set the rownames? where can I find the row names? Is it correct that groupName is the exon id ? I used the following code to import affymetrix annotation file: affyAnnotation<- read.csv(file="MoEx-1_0-st- v1.na27.mm9.transcript.csv",head=TRUE,sep=",",skip=20); and > > Is this just a typo then? It should be > affyAnnotation$transcript_cluster_id instead of > affyAnnotation$transcript_cluster_Id. > > What does exFirma[x2$ID,1] actually give you? If the rownames(exFirma) > are not set, this could mislead you. I used the suggestion you gave to me to redo the lmFit, and here is what I used : mm <- model.matrix(~cls) fit <- lmFit(exFirma[,6:ncol(exFirma)], mm) fit<-eBayes(fit) x<-topTable(fit,coef=2,adj="fdr") Then I used exFirma[x$ID,1], it gave me: [1] "6996667" "6965348" "6982181" "6917790" "6899766" "7005797" "6880577" "6962054" "6897349" [10] "7018774" when I printed x, it gave me: ID logFC t P.Value adj.P.Val B 11253 11253 -4.856783 -8.524568 2.223806e-06 0.4982258 -4.323317 183160 183160 4.058556 7.676406 6.413325e-06 0.7184271 -4.332955 32235 32235 -3.692429 -6.555475 2.963704e-05 1.000 -4.350176 132610 132610 2.315536 6.508698 3.170265e-05 1.000 -4.351036 165279 165279 3.269404 6.311540 4.225007e-05 1.000 -4.354809 171777 171777 3.845620 6.174738 5.172731e-05 1.000 -4.357574 45380 45380 1.971388 6.088251 5.886544e-05 1.000 -4.359388 132873 132873 -4.569042 -6.021868 6.505114e-05 1.000 -4.360815 86037 86037 3.206732 5.890372 7.943255e-05 1.000 -4.363738 27761 27761 -2.651559 -5.825434 8.774484e-05 1.000 -4.365229 when I typed exFirma[11253,], it gave me: exFirma[11253,] unitName groupName unit group cell array1, ... 11253 6996667 5092691 535 6 11253 2.718783 ... so it appeared to me that x$ID matches the index of exFirma entry. Then I used; index<-match(exFirma[x$ID,1], affyAnnotation$transcript_cluster_id) to find the index of these entries (significant different between two groups) in Affy annotation, then I checked the entries in Affy Annotation file: affyAnnotation[index, 1:5] transcript_cluster_id probeset_id seqname strand start 71605 6996667 6996667 chr9 - 70631116 62004 6965348 6965348 chr7 + 151009070 67351 6982181 6982181 chr8 - 47660950 ... But then I don't know how to check whether it is correct or not , or how to find and plot corresponding probeset (exon ) as what was shown in the FIRMA paper supplementary . Can you give me some hints? Thanks! Sabrina > > Cheers, > Mark > > > > > Ok, I just used the following line to read Affy Annotation > > > read.csv(file="MoEx-1_0-st- > > v1.na27.mm9.transcript.csv",head=TRUE,sep=",",skip=20) > > > then I used > > match(exFirma[x2$ID,1],affyAnnotation$transcript_cluster_Id) > > > but I only got NAs > > > I can not figure out what went wrong because I checked the first ID, > > it is in Affy annotation file. Can you point out what I did > > incorrectly? Thanks ! > > > Sabrina > > > On Dec 3, 4:53 pm, Mark Robinson <[EMAIL PROTECTED]> wrote: > >> Sabrina. > > >> See below. > > >> On 04/12/2008, at 8:41 AM, sabrina wrote: > > >> > Hi, Mark: > > >> > Thank you so much for the detailed explanation! > >> > Actually I am working on Mouse Exon arrays. I did check the UCSC > >> > genome browser, but they do not have the Exon Array available as > >> > Human Exon. > > >> Right! Good point. > > >> > My study is trying to find alternative splicing events between two > >> > grou
[aroma.affymetrix] Re: FIRMA score
Ok, I just used the following line to read Affy Annotation read.csv(file="MoEx-1_0-st- v1.na27.mm9.transcript.csv",head=TRUE,sep=",",skip=20) then I used match(exFirma[x2$ID,1],affyAnnotation$transcript_cluster_Id) but I only got NAs I can not figure out what went wrong because I checked the first ID, it is in Affy annotation file. Can you point out what I did incorrectly? Thanks ! Sabrina On Dec 3, 4:53 pm, Mark Robinson <[EMAIL PROTECTED]> wrote: > Sabrina. > > See below. > > On 04/12/2008, at 8:41 AM, sabrina wrote: > > > Hi, Mark: > > > Thank you so much for the detailed explanation! > > Actually I am working on Mouse Exon arrays. I did check the UCSC > > genome browser, but they do not have the Exon Array available as > > Human Exon. > > Right! Good point. > > > > > My study is trying to find alternative splicing events between two > > groups with identical genotypes but different treatments. each group > > has 4 or 5 biological replicates. So I did all as described in the > > examples from Human Exon array, then used the following codes to find > > the significant genes with potential splicing events: > > > ... > > exFirma<-extractDataFrame(firmaScore,addNames=TRUE); > > exFirma[,6:ncol(exFirma)]<-log2(exFirma[,6:ncol(exFirma)]) > > > library(limma) > > > design <- cbind(Grp1=c(1,1,1,1,1,0,0,0,0),Grp2=c(0,0,0,0,0,1,1,1,1)) > > > fit<-lmFit(exFirma[,6:ncol(exFirma)],design) > > fit<-eBayes(fit) > > x<-topTable(fit) > > Careful here as to what you are fitting and testing with limma. You > are fitting a parameterization that would require a contrast to be fit > ( ... such as 'contrasts.fit' ... ) in order to assess the differences > between your 2 groups. Did you get this example from a web page or a > thread somewhere? If so, I should correct it. > > Probably what you want is: > > design <- cbind(Grp1=1,Grp2=c(0,0,0,0,0,1,1,1,1)) > > ... > > topTable(fit,coef=2) > > Under this parameterization, setting coef=2 will test the difference > between the 2 groups. Note that topTable by default gives the top 10 > values, regardless of statistics. > > Does that make sense? > > > However I only got about 10 IDs with all of which adjusted P values as > > 1. I wonder if there is anything I did wrong here. Also when I checked > > the IDs back to Affymetrix annotation file, I could only find one > > match, I wonder what went wrong. > > How exactly did you match them back? > > Mark > > > > > Thanks! > > > BTW, I used the coreR1 , A20080718, MR, cdf version > > > Sabrina > > > On Dec 2, 6:27 pm, Mark Robinson <[EMAIL PROTECTED]> wrote: > >> Hi Sabrina. > > >> Indeed, you'll want to set 'mergeGroups=TRUE' for the probe level > >> model (ExonRmaPlm object) that you send to 'FirmaModel' ... > > >> -http://groups.google.com/group/aroma-affymetrix/web/human-exon-array- > >> ... > >> says: > >> ... > >> plmTr <- ExonRmaPlm(csN, mergeGroups=TRUE) > >> print(plmTr) > >> ... > >> firma <- FirmaModel(plmTr) > >> fit(firma, verbose=verbose) > >> - > > >> NO, the unitName is not the same as gene id in NCBI. If you are > >> using > >> the CDF from: > > >>http://groups.google.com/group/aroma-affymetrix/web/affymetrix- > >> define... > > >> (e.g. HuEx-1_0-st-v2,coreR3,A20071112,EP.cdf) the unitName is an > >> Affymetrix-defined identifier. > > >> If you go to: > > >>http://www.affymetrix.com/products_services/arrays/specific/ > >> exon.affx... > > >> you will see a 'HuEx-1_0-st-v2 Probeset Annotations' file that you > >> can > >> download. In there, you will find the annotations you need (unitName > >> corresponds to the 'transcript_cluster_id' column and groupName > >> corresponds to the 'probeset_id' column) > > >> Also, you may be interested to know that the UCSC genome browser > >> has a > >> track called 'Affy HuEx 1.0' under the set of 'Expression' tracks. > >> Unfortunately, its not searchable on the browser, but quite useful if > >> you can locate it in the browser and find out which exon is > >> differentially spliced. > > >> Hope that helps. > >> Mark > > >> On 03/12/2008, at 5:56 AM, sabrina wrote: > > >>> Hi, all: > >&g
[aroma.affymetrix] Re: FIRMA score
Hi, Mark: Thanks again, here comes more questions :) I got the example from one of the threads here, the original post was: library(limma) fit<-lmFit(exFirma[,6:ncol(exFirma)], model.matrix(~ your.class.variable)) fit<-eBayes(fit) I thought I should be ok to define the design , then fit to it. Am I wrong? Or if I want to use model.matrix, which variable should I use in terms of firmaScore? I am bit confused. . I also looked at LIMMA userguide, so I figured maybe I can do this: contrast.matrix<-makeContrasts(group1-group2,levels=design) fit2<- contrasts.fit(fit,contrast.matrix) fit2<-eBayes(fit2) x2<-topTable(fit2) but when I ran it, it gave me warning as follows: Warning message: In contrasts.fit(fit, contrast.matrix) : row names of contrasts don't match col names of coefficients Can you tell me what I did wrong here? > Careful here as to what you are fitting and testing with limma. You > are fitting a parameterization that would require a contrast to be fit > ( ... such as 'contrasts.fit' ... ) in order to assess the differences > between your 2 groups. Did you get this example from a web page or a > thread somewhere? If so, I should correct it. > > Probably what you want is: > > design <- cbind(Grp1=1,Grp2=c(0,0,0,0,0,1,1,1,1)) > > ... > > topTable(fit,coef=2) > > Under this parameterization, setting coef=2 will test the difference > between the 2 groups. Note that topTable by default gives the top 10 > values, regardless of statistics. > > Does that make sense? > > > However I only got about 10 IDs with all of which adjusted P values as > > 1. I wonder if there is anything I did wrong here. Also when I checked > > the IDs back to Affymetrix annotation file, I could only find one > > match, I wonder what went wrong. > > How exactly did you match them back? once I got x as the topTable, I used exFirma[x$ID,] to see what are the top lists, it showed me ID with unitName and Group Name. So I did manually, using unitName to find a match in AffyAnnotation file. Or should I use GroupName instead? One other problem is that Excel only can not display all rows of the annotation file, so some of them I can't not search. Is there any easy way to do it in R and display it in R? I usually use Matlab, so I am very unfamiliar with R. Thank you so much! Sabrina --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] Re: FIRMA score
Hi, Mark: Thank you so much for the detailed explanation! Actually I am working on Mouse Exon arrays. I did check the UCSC genome browser, but they do not have the Exon Array available as Human Exon. My study is trying to find alternative splicing events between two groups with identical genotypes but different treatments. each group has 4 or 5 biological replicates. So I did all as described in the examples from Human Exon array, then used the following codes to find the significant genes with potential splicing events: ... exFirma<-extractDataFrame(firmaScore,addNames=TRUE); exFirma[,6:ncol(exFirma)]<-log2(exFirma[,6:ncol(exFirma)]) library(limma) design <- cbind(Grp1=c(1,1,1,1,1,0,0,0,0),Grp2=c(0,0,0,0,0,1,1,1,1)) fit<-lmFit(exFirma[,6:ncol(exFirma)],design) fit<-eBayes(fit) x<-topTable(fit) However I only got about 10 IDs with all of which adjusted P values as 1. I wonder if there is anything I did wrong here. Also when I checked the IDs back to Affymetrix annotation file, I could only find one match, I wonder what went wrong. Thanks! BTW, I used the coreR1 , A20080718, MR, cdf version Sabrina On Dec 2, 6:27 pm, Mark Robinson <[EMAIL PROTECTED]> wrote: > Hi Sabrina. > > Indeed, you'll want to set 'mergeGroups=TRUE' for the probe level > model (ExonRmaPlm object) that you send to 'FirmaModel' ... > > -http://groups.google.com/group/aroma-affymetrix/web/human-exon-array-... > says: > ... > plmTr <- ExonRmaPlm(csN, mergeGroups=TRUE) > print(plmTr) > ... > firma <- FirmaModel(plmTr) > fit(firma, verbose=verbose) > - > > NO, the unitName is not the same as gene id in NCBI. If you are using > the CDF from: > > http://groups.google.com/group/aroma-affymetrix/web/affymetrix-define... > > (e.g. HuEx-1_0-st-v2,coreR3,A20071112,EP.cdf) the unitName is an > Affymetrix-defined identifier. > > If you go to: > > http://www.affymetrix.com/products_services/arrays/specific/exon.affx... > > you will see a 'HuEx-1_0-st-v2 Probeset Annotations' file that you can > download. In there, you will find the annotations you need (unitName > corresponds to the 'transcript_cluster_id' column and groupName > corresponds to the 'probeset_id' column) > > Also, you may be interested to know that the UCSC genome browser has a > track called 'Affy HuEx 1.0' under the set of 'Expression' tracks. > Unfortunately, its not searchable on the browser, but quite useful if > you can locate it in the browser and find out which exon is > differentially spliced. > > Hope that helps. > Mark > > On 03/12/2008, at 5:56 AM, sabrina wrote: > > > > > Hi, all: > > I just started learning to use firma analysis. My impression from > > previous discussions of other users is that we should only use plm > > with mergeGroup=True, is that correct? Another question is : Is > > unitName same as gene uid in NCBI database? how can I map groupName to > > EXON name from NCBI? Thanks !!! > > > Sabrina > > -- > Mark Robinson > Epigenetics Laboratory, Garvan > Bioinformatics Division, WEHI > e: [EMAIL PROTECTED] > e: [EMAIL PROTECTED] > p: +61 (0)3 9345 2628 > f: +61 (0)3 9347 0852 > -- --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] FIRMA score
Hi, all: I just started learning to use firma analysis. My impression from previous discussions of other users is that we should only use plm with mergeGroup=True, is that correct? Another question is : Is unitName same as gene uid in NCBI database? how can I map groupName to EXON name from NCBI? Thanks !!! Sabrina --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---
[aroma.affymetrix] plotRle
Hello: I am new to R and aroma. I just tried to run the sample scripts to make sure everything works as it was described before I start to analyze my exon array data. When I tried plotRle, I found that at the figure (boxplot) occupied all most all page, but the xlabel was cut off (only show several chars. Does anyone know how to control parameters, such as font size, ylim etc in plotRle? Thanks! Sabrina --~--~-~--~~~---~--~~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~--~~~~--~~--~--~---