Hi, Mark: for the Unit_id, does it have to be Ensembl gene ID like ENSMUSGxxxx? Lots of genes do not have ensembl assignment from Affy annotation file. There are lots of missing annotaions, and I still have not found any good way to deal with it. Do you have any suggestions?
Thanks Sabrina On Jun 10, 12:32 am, Mark Robinson <mrobin...@wehi.edu.au> wrote: > Hi Sabrina. > > How about you try and create a 'flat' file like the one described > at:http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-file... > > Presumably, you will be comfortable with the Exon Array's 'probetab' > file by now and possibly the Affymetrix annotation CSV file and so you > should have access to all this information. > > For example, from the following table: > > mac1618:HuEx-1_0-st-v2.probe.tab mrobinson$ head HuEx-1_0-st- > v2.probe.tab > Probe ID Probe Set ID probe x probe y assembly seqname start > stop > strand probe sequence target strandedness category > 494998 2315101 917 193 build-34/hg16 chr1 1788 1812 + > > CACGGGAAGTCTGGGCTAAGAGACA Sense main > 1734213 2315101 1092 677 build-34/hg16 chr1 1973 1997 + > > ACAGGGGCCAGAAGATGAACAATGG Sense main > 4767517 2315101 796 1862 build-34/hg16 chr1 1992 2016 + > > ATTAAGTTACATGCAGACAACAGGG Sense main > 4286427 2315101 986 1674 build-34/hg16 chr1 2006 2030 + > > TGCCTGGTTGTGGTATTAAGTTACA Sense main > 5760145 2315102 144 2250 build-34/hg16 chr1 2520 2544 + > > TCGGCCGTCGTCTTCTGCAGCTCTG Sense main > 671410 2315102 689 262 build-34/hg16 chr1 2523 2547 + > > AAGTCGGCCGTCGTCTTCTGCAGCT Sense main > 4275780 2315102 579 1670 build-34/hg16 chr1 2526 2550 + > > TCCAAGTCGGCCGTCGTCTTCTGCA Sense main > 4293462 2315102 341 1677 build-34/hg16 chr1 2531 2555 + > > TGTGATCCAAGTCGGCCGTCGTCTT Sense main > 5388 2315103 267 2 build-34/hg16 chr1 2927 2951 + > > CTGTCTGTCGACCCAGCTGGAGGCA Sense main > [snip] > > ... you see the second column is the probeset_id, which would be used > as the "Group_ID" column for your flat file. Depending on whether you > are using the Ensembl CDF or the Affymetrix annotation, you would need > to create a mapping to get the transcript cluster id column (here, the > "Unit_ID"). Everything else you need (Probe_Sequence, X, Y, Probe_ID) > is within the table above. > > Then, it would be just a matter of filtering OUT those probes that > overlap a SNP, which based on your mapping exercise, you must have a > list of. Then, make a call to the flat2Cdf() script and hopefully > you'll be off and running. > > Let me know how you go. > > Cheers, > Mark > > On 10/06/2009, at 1:00 PM, sabrina wrote: > > > > > > > Thanks , Mark! > > Can you show me /walk me through how to get a new snp-free CDF ? I > > finally got the right version of snp and probe mapping so I am ready > > to try it out! > > > Sabrina > > > On Jun 6, 3:14 am, Mark Robinson <mrobin...@wehi.edu.au> wrote: > >> Hi Sabrina. > > >> Comments below. > > >> On 06/06/2009, at 1:57 AM, sabrina wrote: > > >>> Hi, Mark: > >>> I finally found the SNP data set that is suitable for my case. As I > >>> understand, aroma used RMA to estimate gene level and exon level > >>> intensities. After I estimate gene level (transcript level), I can > >>> use > >>> FIRMA to estimate residual for each exon and compose a score as > >>> described in the paper . My question is: if there is a SNP > >>> difference > >>> between two strains within one exon, should I exclude that exon from > >>> estimating transcript level value? My guess is probably no. > > >> If the SNP affects only 1 probe in an entire transcript, I would > >> expect it to have very little impact on the gene-level summary. And, > >> especially so if there are a large number of total probes for that > >> gene. It may have a noticeable effect on the probe effect. > > >>> So will it > >>> be a good idea if I exclude that exon after I calculate all FIRMA > >>> scores or should I exclude these exons after I estimate residuals , > >>> but only used these residuals not affected by SNPs for firma score > >>> estimation? Thanks > > >> Keep in mind the residuals are calculated at the probe-level, not the > >> probeset-level. The FIRMA score is then a summary of the all the > >> residuals for a probeset. > > >> I think you have (at least) 3 choices: > > >> 1. (preferred, i would think) you could remove all affected *probes* > >> (via the creation of a SNP-affected-probe-free CDF) in advance, then > >> run FIRMA as normal. I can help with this if you tell me which > >> probes > >> are affected. > > >> 2. remove the affected *probesets* afterwards, since you may not > >> believe the FIRMA scores for which these are based on. > > >> 3. as you suggested, only calculate FIRMA scores from unaffected > >> residuals. But, the information you require to do this is the same > >> information required to do #1 and it would seems like #1 is > >> preferred. > > >> The good thing about option #1 is you would still have some ability > >> to > >> detect differential splicing for the probeset (instead of tossing it > >> away), albeit with the smaller number of remaining unaffected probes. > > >> Cheers, > >> Mark > > >>> Sabrina > > >>> On Apr 30, 3:46 am, Mark Robinson <mrobin...@wehi.edu.au> wrote: > >>>> Hi Sabrina. > > >>>> I have not had to deal with this myself, but I do know that it > >>>> exists > >>>> and I can at least suggest a possible route to exclude affected > >>>> exons. > > >>>> Presumably, there is a database (dbSNP?) that tells you the genome > >>>> locations of each SNP for your strains. There is also a probe.tab > >>>> file from Affymetrix that gives you the mapped genome locations of > >>>> each probe (or you could take the sequences from the same file and > >>>> map > >>>> them yourself with a tool like BLAT). It is then just a matter of > >>>> looking whether each probe maps to a location on the genome that > >>>> overlaps a SNP. There is probably a Bioconductor tool for this or > >>>> you > >>>> could create a hash, etc. > > >>>> There are a couple levels at which you might introduce this to your > >>>> analysis. You could remove individual probes that are affected. > >>>> On > >>>> the aroma.affymetrix side, this would require creating a new CDF > >>>> with > >>>> those affected probes not included (a bit tricky but doable). Or, > >>>> you > >>>> could simply post-process your existing results and remove > >>>> probesets > >>>> that have an affected probe (easier but not as elegant). > > >>>> You might've also seen: > > >>>> Duan S, Zhang W, Bleibel WK, Cox NJ, Dolan ME: SNPinProbe 1.0: A > >>>> database for filtering out > >>>> probes in the Affymetrix GeneChip(R) HumanExon1.0 ST array > >>>> potentially affected bySNPs. > >>>> Bioinformation 2008, 2(10):469{470. > > >>>> Hope that gets you started. > > >>>> Cheers, > >>>> Mark > > >>>> On 30/04/2009, at 6:07 AM, sabrina wrote: > > >>>>> Hi, all: > >>>>> I am using Aroma for detectingexonskipping events around two > >>>>> groups > >>>>> (two different strains). I found out that several of my top hits > >>>>> indeed includes at least one SNP between two strains. I wonder if > >>>>> anyone has some suggestion about how to deal with this situation. > >>>>> If I > >>>>> need to remove all affected exons from analysis, how can I do > >>>>> it? I > >>>>> never worked with SNP data before, can anyone give me a hint? > >>>>> Thanks a > >>>>> lot! > > >>>>> Sabrina > > >>>> ------------------------------ > >>>> Mark Robinson > >>>> Epigenetics Laboratory, Garvan > >>>> Bioinformatics Division, WEHI > >>>> e: m.robin...@garvan.org.au > >>>> e: mrobin...@wehi.edu.au > >>>> p: +61 (0)3 9345 2628 > >>>> f: +61 (0)3 9347 0852 > >>>> ------------------------------ > > >> ------------------------------ > >> Mark Robinson, PhD (Melb) > >> Epigenetics Laboratory, Garvan > >> Bioinformatics Division, WEHI > >> e: m.robin...@garvan.org.au > >> e: mrobin...@wehi.edu.au > >> p: +61 (0)3 9345 2628 > >> f: +61 (0)3 9347 0852 > >> ------------------------------ > > ------------------------------ > Mark Robinson, PhD (Melb) > Epigenetics Laboratory, Garvan > Bioinformatics Division, WEHI > e: m.robin...@garvan.org.au > e: mrobin...@wehi.edu.au > p: +61 (0)3 9345 2628 > f: +61 (0)3 9347 0852 > ------------------------------ --~--~---------~--~----~------------~-------~--~----~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~----------~----~----~----~------~----~------~--~---