[aroma.affymetrix] Re: SNPs affecting EXon splicing detection

Mark Robinson Tue, 09 Jun 2009 21:32:24 -0700

Hi Sabrina.

How about you try and create a 'flat' file like the one described at:
http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-files-from-scratch


Presumably, you will be comfortable with the Exon Array's 'probetab'  
file by now and possibly the Affymetrix annotation CSV file and so you  
should have access to all this information.

For example, from the following table:

mac1618:HuEx-1_0-st-v2.probe.tab mrobinson$ head HuEx-1_0-st- 
v2.probe.tab
Probe ID        Probe Set ID    probe x probe y assembly        seqname start   
stop     
strand  probe sequence  target strandedness     category
494998  2315101 917     193     build-34/hg16   chr1    1788    1812    +       
 
CACGGGAAGTCTGGGCTAAGAGACA       Sense   main
1734213 2315101 1092    677     build-34/hg16   chr1    1973    1997    +       
 
ACAGGGGCCAGAAGATGAACAATGG       Sense   main
4767517 2315101 796     1862    build-34/hg16   chr1    1992    2016    +       
 
ATTAAGTTACATGCAGACAACAGGG       Sense   main
4286427 2315101 986     1674    build-34/hg16   chr1    2006    2030    +       
 
TGCCTGGTTGTGGTATTAAGTTACA       Sense   main
5760145 2315102 144     2250    build-34/hg16   chr1    2520    2544    +       
 
TCGGCCGTCGTCTTCTGCAGCTCTG       Sense   main
671410  2315102 689     262     build-34/hg16   chr1    2523    2547    +       
 
AAGTCGGCCGTCGTCTTCTGCAGCT       Sense   main
4275780 2315102 579     1670    build-34/hg16   chr1    2526    2550    +       
 
TCCAAGTCGGCCGTCGTCTTCTGCA       Sense   main
4293462 2315102 341     1677    build-34/hg16   chr1    2531    2555    +       
 
TGTGATCCAAGTCGGCCGTCGTCTT       Sense   main
5388    2315103 267     2       build-34/hg16   chr1    2927    2951    +       
 
CTGTCTGTCGACCCAGCTGGAGGCA       Sense   main
[snip]

... you see the second column is the probeset_id, which would be used  
as the "Group_ID" column for your flat file.  Depending on whether you  
are using the Ensembl CDF or the Affymetrix annotation, you would need  
to create a mapping to get the transcript cluster id column (here, the  
"Unit_ID").  Everything else you need (Probe_Sequence, X, Y, Probe_ID)  
is within the table above.

Then, it would be just a matter of filtering OUT those probes that  
overlap a SNP, which based on your mapping exercise, you must have a  
list of.  Then, make a call to the flat2Cdf() script and hopefully  
you'll be off and running.

Let me know how you go.

Cheers,
Mark

On 10/06/2009, at 1:00 PM, sabrina wrote:

>
> Thanks , Mark!
> Can you show me /walk me through how to get a new snp-free CDF ? I
> finally got the right version of snp and probe mapping so I am ready
> to try it out!
>
> Sabrina
>
> On Jun 6, 3:14 am, Mark Robinson <mrobin...@wehi.edu.au> wrote:
>> Hi Sabrina.
>>
>> Comments below.
>>
>> On 06/06/2009, at 1:57 AM, sabrina wrote:
>>
>>
>>
>>> Hi, Mark:
>>> I finally found the SNP data set that is suitable for my case. As I
>>> understand, aroma used RMA to estimate gene level and exon level
>>> intensities. After I estimate gene level (transcript level), I can  
>>> use
>>> FIRMA to estimate residual for each exon and compose a score as
>>> described in the paper . My question is: if there is a SNP  
>>> difference
>>> between two strains within one exon, should I exclude that exon from
>>> estimating transcript level value? My guess is probably no.
>>
>> If the SNP affects only 1 probe in an entire transcript, I would
>> expect it to have very little impact on the gene-level summary.  And,
>> especially so if there are a large number of total probes for that
>> gene.  It may have a noticeable effect on the probe effect.
>>
>>> So will it
>>> be a good idea if I exclude that exon after I calculate all FIRMA
>>> scores or  should I exclude these exons after I estimate residuals ,
>>> but only used these residuals not affected by SNPs for firma score
>>> estimation? Thanks
>>
>> Keep in mind the residuals are calculated at the probe-level, not the
>> probeset-level.  The FIRMA score is then a summary of the all the
>> residuals for a probeset.
>>
>> I think you have (at least) 3 choices:
>>
>> 1. (preferred, i would think) you could remove all affected *probes*
>> (via the creation of a SNP-affected-probe-free CDF) in advance, then
>> run FIRMA as normal.  I can help with this if you tell me which  
>> probes
>> are affected.
>>
>> 2. remove the affected *probesets* afterwards, since you may not
>> believe the FIRMA scores for which these are based on.
>>
>> 3. as you suggested, only calculate FIRMA scores from unaffected
>> residuals.  But, the information you require to do this is the same
>> information required to do #1 and it would seems like #1 is  
>> preferred.
>>
>> The good thing about option #1 is you would still have some ability  
>> to
>> detect differential splicing for the probeset (instead of tossing it
>> away), albeit with the smaller number of remaining unaffected probes.
>>
>> Cheers,
>> Mark
>>
>>
>>
>>> Sabrina
>>
>>> On Apr 30, 3:46 am, Mark Robinson <mrobin...@wehi.edu.au> wrote:
>>>> Hi Sabrina.
>>
>>>> I have not had to deal with this myself, but I do know that it  
>>>> exists
>>>> and I can at least suggest a possible route to exclude affected
>>>> exons.
>>
>>>> Presumably, there is a database (dbSNP?) that tells you the genome
>>>> locations of each SNP for your strains.  There is also a probe.tab
>>>> file from Affymetrix that gives you the mapped genome locations of
>>>> each probe (or you could take the sequences from the same file and
>>>> map
>>>> them yourself with a tool like BLAT).  It is then just a matter of
>>>> looking whether each probe maps to a location on the genome that
>>>> overlaps a SNP.  There is probably a Bioconductor tool for this or
>>>> you
>>>> could create a hash, etc.
>>
>>>> There are a couple levels at which you might introduce this to your
>>>> analysis.  You could remove individual probes that are affected.   
>>>> On
>>>> the aroma.affymetrix side, this would require creating a new CDF  
>>>> with
>>>> those affected probes not included (a bit tricky but doable).  Or,
>>>> you
>>>> could simply post-process your existing results and remove  
>>>> probesets
>>>> that have an affected probe (easier but not as elegant).
>>
>>>> You might've also seen:
>>
>>>> Duan S, Zhang W, Bleibel WK, Cox NJ, Dolan ME: SNPinProbe 1.0: A
>>>> database for filtering out
>>>> probes in the Affymetrix GeneChip(R) HumanExon1.0 ST array
>>>> potentially affected bySNPs.
>>>> Bioinformation 2008, 2(10):469{470.
>>
>>>> Hope that gets you started.
>>
>>>> Cheers,
>>>> Mark
>>
>>>> On 30/04/2009, at 6:07 AM, sabrina wrote:
>>
>>>>> Hi, all:
>>>>> I am using Aroma for detectingexonskipping events around two  
>>>>> groups
>>>>> (two different strains). I found out that several of my top hits
>>>>> indeed includes at least one SNP between two strains. I wonder if
>>>>> anyone has some suggestion about how to deal with this situation.
>>>>> If I
>>>>> need to remove all affected exons from analysis, how can I do  
>>>>> it? I
>>>>> never worked with SNP data before, can anyone give me a hint?
>>>>> Thanks a
>>>>> lot!
>>
>>>>> Sabrina
>>
>>>> ------------------------------
>>>> Mark Robinson
>>>> Epigenetics Laboratory, Garvan
>>>> Bioinformatics Division, WEHI
>>>> e: m.robin...@garvan.org.au
>>>> e: mrobin...@wehi.edu.au
>>>> p: +61 (0)3 9345 2628
>>>> f: +61 (0)3 9347 0852
>>>> ------------------------------
>>
>> ------------------------------
>> Mark Robinson, PhD (Melb)
>> Epigenetics Laboratory, Garvan
>> Bioinformatics Division, WEHI
>> e: m.robin...@garvan.org.au
>> e: mrobin...@wehi.edu.au
>> p: +61 (0)3 9345 2628
>> f: +61 (0)3 9347 0852
>> ------------------------------
> >

------------------------------
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------






--~--~---------~--~----~------------~-------~--~----~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~----------~----~----~----~------~----~------~--~---

[aroma.affymetrix] Re: SNPs affecting EXon splicing detection

Reply via email to