Hi Sabrina. How about you try and create a 'flat' file like the one described at: http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-files-from-scratch
Presumably, you will be comfortable with the Exon Array's 'probetab' file by now and possibly the Affymetrix annotation CSV file and so you should have access to all this information. For example, from the following table: mac1618:HuEx-1_0-st-v2.probe.tab mrobinson$ head HuEx-1_0-st- v2.probe.tab Probe ID Probe Set ID probe x probe y assembly seqname start stop strand probe sequence target strandedness category 494998 2315101 917 193 build-34/hg16 chr1 1788 1812 + CACGGGAAGTCTGGGCTAAGAGACA Sense main 1734213 2315101 1092 677 build-34/hg16 chr1 1973 1997 + ACAGGGGCCAGAAGATGAACAATGG Sense main 4767517 2315101 796 1862 build-34/hg16 chr1 1992 2016 + ATTAAGTTACATGCAGACAACAGGG Sense main 4286427 2315101 986 1674 build-34/hg16 chr1 2006 2030 + TGCCTGGTTGTGGTATTAAGTTACA Sense main 5760145 2315102 144 2250 build-34/hg16 chr1 2520 2544 + TCGGCCGTCGTCTTCTGCAGCTCTG Sense main 671410 2315102 689 262 build-34/hg16 chr1 2523 2547 + AAGTCGGCCGTCGTCTTCTGCAGCT Sense main 4275780 2315102 579 1670 build-34/hg16 chr1 2526 2550 + TCCAAGTCGGCCGTCGTCTTCTGCA Sense main 4293462 2315102 341 1677 build-34/hg16 chr1 2531 2555 + TGTGATCCAAGTCGGCCGTCGTCTT Sense main 5388 2315103 267 2 build-34/hg16 chr1 2927 2951 + CTGTCTGTCGACCCAGCTGGAGGCA Sense main [snip] ... you see the second column is the probeset_id, which would be used as the "Group_ID" column for your flat file. Depending on whether you are using the Ensembl CDF or the Affymetrix annotation, you would need to create a mapping to get the transcript cluster id column (here, the "Unit_ID"). Everything else you need (Probe_Sequence, X, Y, Probe_ID) is within the table above. Then, it would be just a matter of filtering OUT those probes that overlap a SNP, which based on your mapping exercise, you must have a list of. Then, make a call to the flat2Cdf() script and hopefully you'll be off and running. Let me know how you go. Cheers, Mark On 10/06/2009, at 1:00 PM, sabrina wrote: > > Thanks , Mark! > Can you show me /walk me through how to get a new snp-free CDF ? I > finally got the right version of snp and probe mapping so I am ready > to try it out! > > Sabrina > > On Jun 6, 3:14 am, Mark Robinson <mrobin...@wehi.edu.au> wrote: >> Hi Sabrina. >> >> Comments below. >> >> On 06/06/2009, at 1:57 AM, sabrina wrote: >> >> >> >>> Hi, Mark: >>> I finally found the SNP data set that is suitable for my case. As I >>> understand, aroma used RMA to estimate gene level and exon level >>> intensities. After I estimate gene level (transcript level), I can >>> use >>> FIRMA to estimate residual for each exon and compose a score as >>> described in the paper . My question is: if there is a SNP >>> difference >>> between two strains within one exon, should I exclude that exon from >>> estimating transcript level value? My guess is probably no. >> >> If the SNP affects only 1 probe in an entire transcript, I would >> expect it to have very little impact on the gene-level summary. And, >> especially so if there are a large number of total probes for that >> gene. It may have a noticeable effect on the probe effect. >> >>> So will it >>> be a good idea if I exclude that exon after I calculate all FIRMA >>> scores or should I exclude these exons after I estimate residuals , >>> but only used these residuals not affected by SNPs for firma score >>> estimation? Thanks >> >> Keep in mind the residuals are calculated at the probe-level, not the >> probeset-level. The FIRMA score is then a summary of the all the >> residuals for a probeset. >> >> I think you have (at least) 3 choices: >> >> 1. (preferred, i would think) you could remove all affected *probes* >> (via the creation of a SNP-affected-probe-free CDF) in advance, then >> run FIRMA as normal. I can help with this if you tell me which >> probes >> are affected. >> >> 2. remove the affected *probesets* afterwards, since you may not >> believe the FIRMA scores for which these are based on. >> >> 3. as you suggested, only calculate FIRMA scores from unaffected >> residuals. But, the information you require to do this is the same >> information required to do #1 and it would seems like #1 is >> preferred. >> >> The good thing about option #1 is you would still have some ability >> to >> detect differential splicing for the probeset (instead of tossing it >> away), albeit with the smaller number of remaining unaffected probes. >> >> Cheers, >> Mark >> >> >> >>> Sabrina >> >>> On Apr 30, 3:46 am, Mark Robinson <mrobin...@wehi.edu.au> wrote: >>>> Hi Sabrina. >> >>>> I have not had to deal with this myself, but I do know that it >>>> exists >>>> and I can at least suggest a possible route to exclude affected >>>> exons. >> >>>> Presumably, there is a database (dbSNP?) that tells you the genome >>>> locations of each SNP for your strains. There is also a probe.tab >>>> file from Affymetrix that gives you the mapped genome locations of >>>> each probe (or you could take the sequences from the same file and >>>> map >>>> them yourself with a tool like BLAT). It is then just a matter of >>>> looking whether each probe maps to a location on the genome that >>>> overlaps a SNP. There is probably a Bioconductor tool for this or >>>> you >>>> could create a hash, etc. >> >>>> There are a couple levels at which you might introduce this to your >>>> analysis. You could remove individual probes that are affected. >>>> On >>>> the aroma.affymetrix side, this would require creating a new CDF >>>> with >>>> those affected probes not included (a bit tricky but doable). Or, >>>> you >>>> could simply post-process your existing results and remove >>>> probesets >>>> that have an affected probe (easier but not as elegant). >> >>>> You might've also seen: >> >>>> Duan S, Zhang W, Bleibel WK, Cox NJ, Dolan ME: SNPinProbe 1.0: A >>>> database for filtering out >>>> probes in the Affymetrix GeneChip(R) HumanExon1.0 ST array >>>> potentially affected bySNPs. >>>> Bioinformation 2008, 2(10):469{470. >> >>>> Hope that gets you started. >> >>>> Cheers, >>>> Mark >> >>>> On 30/04/2009, at 6:07 AM, sabrina wrote: >> >>>>> Hi, all: >>>>> I am using Aroma for detectingexonskipping events around two >>>>> groups >>>>> (two different strains). I found out that several of my top hits >>>>> indeed includes at least one SNP between two strains. I wonder if >>>>> anyone has some suggestion about how to deal with this situation. >>>>> If I >>>>> need to remove all affected exons from analysis, how can I do >>>>> it? I >>>>> never worked with SNP data before, can anyone give me a hint? >>>>> Thanks a >>>>> lot! >> >>>>> Sabrina >> >>>> ------------------------------ >>>> Mark Robinson >>>> Epigenetics Laboratory, Garvan >>>> Bioinformatics Division, WEHI >>>> e: m.robin...@garvan.org.au >>>> e: mrobin...@wehi.edu.au >>>> p: +61 (0)3 9345 2628 >>>> f: +61 (0)3 9347 0852 >>>> ------------------------------ >> >> ------------------------------ >> Mark Robinson, PhD (Melb) >> Epigenetics Laboratory, Garvan >> Bioinformatics Division, WEHI >> e: m.robin...@garvan.org.au >> e: mrobin...@wehi.edu.au >> p: +61 (0)3 9345 2628 >> f: +61 (0)3 9347 0852 >> ------------------------------ > > ------------------------------ Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robin...@garvan.org.au e: mrobin...@wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 ------------------------------ --~--~---------~--~----~------------~-------~--~----~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~----------~----~----~----~------~----~------~--~---