[aroma.affymetrix] Re: SNPs affecting EXon splicing detection

Mark Robinson Thu, 11 Jun 2009 06:41:33 -0700

Hi Sabrina.

The Unit_ID can be any "transcript cluster" identifier of your  
choice.  The easiest may be to use the Affymetrix transcript cluster  
identifier itself ... available from:


http://www.affymetrix.com/analysis/downloads/current_exon/MoEx-1_0-st-v1.mm9.probeset.csv.zip

See the 'transcript_cluster_id' column.  Perhaps only take the "core"  
probes, as defined in the the 'level' column?

Note: we used Ensembl in that flat2Cdf() example since we were using a  
custom organization (i.e. non-Affy) of the probesets.

Cheers,
Mark


On 11/06/2009, at 10:58 PM, sabrina wrote:

>
> Hi, Mark:
> for the Unit_id, does it have to be Ensembl gene ID like ENSMUSGxxxx?
> Lots of genes do not have ensembl assignment from Affy annotation
> file. There are lots of missing annotaions, and I still have not found
> any good way to deal with it. Do you have any suggestions?
>
> Thanks
>
> Sabrina
>
> On Jun 10, 12:32 am, Mark Robinson <mrobin...@wehi.edu.au> wrote:
>> Hi Sabrina.
>>
>> How about you try and create a 'flat' file like the one described  
>> at:http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-file 
>> ...
>>
>> Presumably, you will be comfortable with the Exon Array's 'probetab'
>> file by now and possibly the Affymetrix annotation CSV file and so  
>> you
>> should have access to all this information.
>>
>> For example, from the following table:
>>
>> mac1618:HuEx-1_0-st-v2.probe.tab mrobinson$ head HuEx-1_0-st-
>> v2.probe.tab
>> Probe ID        Probe Set ID    probe x probe y assembly         
>> seqname start   stop
>> strand  probe sequence  target strandedness     category
>> 494998  2315101 917     193     build-34/hg16   chr1    1788     
>> 1812    +
>> CACGGGAAGTCTGGGCTAAGAGACA       Sense   main
>> 1734213 2315101 1092    677     build-34/hg16   chr1    1973     
>> 1997    +
>> ACAGGGGCCAGAAGATGAACAATGG       Sense   main
>> 4767517 2315101 796     1862    build-34/hg16   chr1    1992     
>> 2016    +
>> ATTAAGTTACATGCAGACAACAGGG       Sense   main
>> 4286427 2315101 986     1674    build-34/hg16   chr1    2006     
>> 2030    +
>> TGCCTGGTTGTGGTATTAAGTTACA       Sense   main
>> 5760145 2315102 144     2250    build-34/hg16   chr1    2520     
>> 2544    +
>> TCGGCCGTCGTCTTCTGCAGCTCTG       Sense   main
>> 671410  2315102 689     262     build-34/hg16   chr1    2523     
>> 2547    +
>> AAGTCGGCCGTCGTCTTCTGCAGCT       Sense   main
>> 4275780 2315102 579     1670    build-34/hg16   chr1    2526     
>> 2550    +
>> TCCAAGTCGGCCGTCGTCTTCTGCA       Sense   main
>> 4293462 2315102 341     1677    build-34/hg16   chr1    2531     
>> 2555    +
>> TGTGATCCAAGTCGGCCGTCGTCTT       Sense   main
>> 5388    2315103 267     2       build-34/hg16   chr1    2927     
>> 2951    +
>> CTGTCTGTCGACCCAGCTGGAGGCA       Sense   main
>> [snip]
>>
>> ... you see the second column is the probeset_id, which would be used
>> as the "Group_ID" column for your flat file.  Depending on whether  
>> you
>> are using the Ensembl CDF or the Affymetrix annotation, you would  
>> need
>> to create a mapping to get the transcript cluster id column (here,  
>> the
>> "Unit_ID").  Everything else you need (Probe_Sequence, X, Y,  
>> Probe_ID)
>> is within the table above.
>>
>> Then, it would be just a matter of filtering OUT those probes that
>> overlap a SNP, which based on your mapping exercise, you must have a
>> list of.  Then, make a call to the flat2Cdf() script and hopefully
>> you'll be off and running.
>>
>> Let me know how you go.
>>
>> Cheers,
>> Mark
>>
>> On 10/06/2009, at 1:00 PM, sabrina wrote:
>>
>>
>>
>>
>>
>>> Thanks , Mark!
>>> Can you show me /walk me through how to get a new snp-free CDF ? I
>>> finally got the right version of snp and probe mapping so I am ready
>>> to try it out!
>>
>>> Sabrina
>>
>>> On Jun 6, 3:14 am, Mark Robinson <mrobin...@wehi.edu.au> wrote:
>>>> Hi Sabrina.
>>
>>>> Comments below.
>>
>>>> On 06/06/2009, at 1:57 AM, sabrina wrote:
>>
>>>>> Hi, Mark:
>>>>> I finally found the SNP data set that is suitable for my case.  
>>>>> As I
>>>>> understand, aroma used RMA to estimate gene level and exon level
>>>>> intensities. After I estimate gene level (transcript level), I can
>>>>> use
>>>>> FIRMA to estimate residual for each exon and compose a score as
>>>>> described in the paper . My question is: if there is a SNP
>>>>> difference
>>>>> between two strains within one exon, should I exclude that exon  
>>>>> from
>>>>> estimating transcript level value? My guess is probably no.
>>
>>>> If the SNP affects only 1 probe in an entire transcript, I would
>>>> expect it to have very little impact on the gene-level summary.   
>>>> And,
>>>> especially so if there are a large number of total probes for that
>>>> gene.  It may have a noticeable effect on the probe effect.
>>
>>>>> So will it
>>>>> be a good idea if I exclude that exon after I calculate all FIRMA
>>>>> scores or  should I exclude these exons after I estimate  
>>>>> residuals ,
>>>>> but only used these residuals not affected by SNPs for firma score
>>>>> estimation? Thanks
>>
>>>> Keep in mind the residuals are calculated at the probe-level, not  
>>>> the
>>>> probeset-level.  The FIRMA score is then a summary of the all the
>>>> residuals for a probeset.
>>
>>>> I think you have (at least) 3 choices:
>>
>>>> 1. (preferred, i would think) you could remove all affected  
>>>> *probes*
>>>> (via the creation of a SNP-affected-probe-free CDF) in advance,  
>>>> then
>>>> run FIRMA as normal.  I can help with this if you tell me which
>>>> probes
>>>> are affected.
>>
>>>> 2. remove the affected *probesets* afterwards, since you may not
>>>> believe the FIRMA scores for which these are based on.
>>
>>>> 3. as you suggested, only calculate FIRMA scores from unaffected
>>>> residuals.  But, the information you require to do this is the same
>>>> information required to do #1 and it would seems like #1 is
>>>> preferred.
>>
>>>> The good thing about option #1 is you would still have some ability
>>>> to
>>>> detect differential splicing for the probeset (instead of tossing  
>>>> it
>>>> away), albeit with the smaller number of remaining unaffected  
>>>> probes.
>>
>>>> Cheers,
>>>> Mark
>>
>>>>> Sabrina
>>
>>>>> On Apr 30, 3:46 am, Mark Robinson <mrobin...@wehi.edu.au> wrote:
>>>>>> Hi Sabrina.
>>
>>>>>> I have not had to deal with this myself, but I do know that it
>>>>>> exists
>>>>>> and I can at least suggest a possible route to exclude affected
>>>>>> exons.
>>
>>>>>> Presumably, there is a database (dbSNP?) that tells you the  
>>>>>> genome
>>>>>> locations of each SNP for your strains.  There is also a  
>>>>>> probe.tab
>>>>>> file from Affymetrix that gives you the mapped genome locations  
>>>>>> of
>>>>>> each probe (or you could take the sequences from the same file  
>>>>>> and
>>>>>> map
>>>>>> them yourself with a tool like BLAT).  It is then just a matter  
>>>>>> of
>>>>>> looking whether each probe maps to a location on the genome that
>>>>>> overlaps a SNP.  There is probably a Bioconductor tool for this  
>>>>>> or
>>>>>> you
>>>>>> could create a hash, etc.
>>
>>>>>> There are a couple levels at which you might introduce this to  
>>>>>> your
>>>>>> analysis.  You could remove individual probes that are affected.
>>>>>> On
>>>>>> the aroma.affymetrix side, this would require creating a new CDF
>>>>>> with
>>>>>> those affected probes not included (a bit tricky but doable).   
>>>>>> Or,
>>>>>> you
>>>>>> could simply post-process your existing results and remove
>>>>>> probesets
>>>>>> that have an affected probe (easier but not as elegant).
>>
>>>>>> You might've also seen:
>>
>>>>>> Duan S, Zhang W, Bleibel WK, Cox NJ, Dolan ME: SNPinProbe 1.0: A
>>>>>> database for filtering out
>>>>>> probes in the Affymetrix GeneChip(R) HumanExon1.0 ST array
>>>>>> potentially affected bySNPs.
>>>>>> Bioinformation 2008, 2(10):469{470.
>>
>>>>>> Hope that gets you started.
>>
>>>>>> Cheers,
>>>>>> Mark
>>
>>>>>> On 30/04/2009, at 6:07 AM, sabrina wrote:
>>
>>>>>>> Hi, all:
>>>>>>> I am using Aroma for detectingexonskipping events around two
>>>>>>> groups
>>>>>>> (two different strains). I found out that several of my top hits
>>>>>>> indeed includes at least one SNP between two strains. I wonder  
>>>>>>> if
>>>>>>> anyone has some suggestion about how to deal with this  
>>>>>>> situation.
>>>>>>> If I
>>>>>>> need to remove all affected exons from analysis, how can I do
>>>>>>> it? I
>>>>>>> never worked with SNP data before, can anyone give me a hint?
>>>>>>> Thanks a
>>>>>>> lot!
>>
>>>>>>> Sabrina
>>
>>>>>> ------------------------------
>>>>>> Mark Robinson
>>>>>> Epigenetics Laboratory, Garvan
>>>>>> Bioinformatics Division, WEHI
>>>>>> e: m.robin...@garvan.org.au
>>>>>> e: mrobin...@wehi.edu.au
>>>>>> p: +61 (0)3 9345 2628
>>>>>> f: +61 (0)3 9347 0852
>>>>>> ------------------------------
>>
>>>> ------------------------------
>>>> Mark Robinson, PhD (Melb)
>>>> Epigenetics Laboratory, Garvan
>>>> Bioinformatics Division, WEHI
>>>> e: m.robin...@garvan.org.au
>>>> e: mrobin...@wehi.edu.au
>>>> p: +61 (0)3 9345 2628
>>>> f: +61 (0)3 9347 0852
>>>> ------------------------------
>>
>> ------------------------------
>> Mark Robinson, PhD (Melb)
>> Epigenetics Laboratory, Garvan
>> Bioinformatics Division, WEHI
>> e: m.robin...@garvan.org.au
>> e: mrobin...@wehi.edu.au
>> p: +61 (0)3 9345 2628
>> f: +61 (0)3 9347 0852
>> ------------------------------
> >

------------------------------
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robin...@garvan.org.au
e: mrobin...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------






--~--~---------~--~----~------------~-------~--~----~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~----------~----~----~----~------~----~------~--~---

[aroma.affymetrix] Re: SNPs affecting EXon splicing detection

Reply via email to