Hi, Mark:
for the Unit_id, does it have to be Ensembl gene ID like ENSMUSGxxxx?
Lots of genes do not have ensembl assignment from Affy annotation
file. There are lots of missing annotaions, and I still have not found
any good way to deal with it. Do you have any suggestions?

Thanks

Sabrina

On Jun 10, 12:32 am, Mark Robinson <mrobin...@wehi.edu.au> wrote:
> Hi Sabrina.
>
> How about you try and create a 'flat' file like the one described 
> at:http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-file...
>
> Presumably, you will be comfortable with the Exon Array's 'probetab'  
> file by now and possibly the Affymetrix annotation CSV file and so you  
> should have access to all this information.
>
> For example, from the following table:
>
> mac1618:HuEx-1_0-st-v2.probe.tab mrobinson$ head HuEx-1_0-st-
> v2.probe.tab
> Probe ID        Probe Set ID    probe x probe y assembly        seqname start 
>   stop    
> strand  probe sequence  target strandedness     category
> 494998  2315101 917     193     build-34/hg16   chr1    1788    1812    +     
>    
> CACGGGAAGTCTGGGCTAAGAGACA       Sense   main
> 1734213 2315101 1092    677     build-34/hg16   chr1    1973    1997    +     
>    
> ACAGGGGCCAGAAGATGAACAATGG       Sense   main
> 4767517 2315101 796     1862    build-34/hg16   chr1    1992    2016    +     
>    
> ATTAAGTTACATGCAGACAACAGGG       Sense   main
> 4286427 2315101 986     1674    build-34/hg16   chr1    2006    2030    +     
>    
> TGCCTGGTTGTGGTATTAAGTTACA       Sense   main
> 5760145 2315102 144     2250    build-34/hg16   chr1    2520    2544    +     
>    
> TCGGCCGTCGTCTTCTGCAGCTCTG       Sense   main
> 671410  2315102 689     262     build-34/hg16   chr1    2523    2547    +     
>    
> AAGTCGGCCGTCGTCTTCTGCAGCT       Sense   main
> 4275780 2315102 579     1670    build-34/hg16   chr1    2526    2550    +     
>    
> TCCAAGTCGGCCGTCGTCTTCTGCA       Sense   main
> 4293462 2315102 341     1677    build-34/hg16   chr1    2531    2555    +     
>    
> TGTGATCCAAGTCGGCCGTCGTCTT       Sense   main
> 5388    2315103 267     2       build-34/hg16   chr1    2927    2951    +     
>    
> CTGTCTGTCGACCCAGCTGGAGGCA       Sense   main
> [snip]
>
> ... you see the second column is the probeset_id, which would be used  
> as the "Group_ID" column for your flat file.  Depending on whether you  
> are using the Ensembl CDF or the Affymetrix annotation, you would need  
> to create a mapping to get the transcript cluster id column (here, the  
> "Unit_ID").  Everything else you need (Probe_Sequence, X, Y, Probe_ID)  
> is within the table above.
>
> Then, it would be just a matter of filtering OUT those probes that  
> overlap a SNP, which based on your mapping exercise, you must have a  
> list of.  Then, make a call to the flat2Cdf() script and hopefully  
> you'll be off and running.
>
> Let me know how you go.
>
> Cheers,
> Mark
>
> On 10/06/2009, at 1:00 PM, sabrina wrote:
>
>
>
>
>
> > Thanks , Mark!
> > Can you show me /walk me through how to get a new snp-free CDF ? I
> > finally got the right version of snp and probe mapping so I am ready
> > to try it out!
>
> > Sabrina
>
> > On Jun 6, 3:14 am, Mark Robinson <mrobin...@wehi.edu.au> wrote:
> >> Hi Sabrina.
>
> >> Comments below.
>
> >> On 06/06/2009, at 1:57 AM, sabrina wrote:
>
> >>> Hi, Mark:
> >>> I finally found the SNP data set that is suitable for my case. As I
> >>> understand, aroma used RMA to estimate gene level and exon level
> >>> intensities. After I estimate gene level (transcript level), I can  
> >>> use
> >>> FIRMA to estimate residual for each exon and compose a score as
> >>> described in the paper . My question is: if there is a SNP  
> >>> difference
> >>> between two strains within one exon, should I exclude that exon from
> >>> estimating transcript level value? My guess is probably no.
>
> >> If the SNP affects only 1 probe in an entire transcript, I would
> >> expect it to have very little impact on the gene-level summary.  And,
> >> especially so if there are a large number of total probes for that
> >> gene.  It may have a noticeable effect on the probe effect.
>
> >>> So will it
> >>> be a good idea if I exclude that exon after I calculate all FIRMA
> >>> scores or  should I exclude these exons after I estimate residuals ,
> >>> but only used these residuals not affected by SNPs for firma score
> >>> estimation? Thanks
>
> >> Keep in mind the residuals are calculated at the probe-level, not the
> >> probeset-level.  The FIRMA score is then a summary of the all the
> >> residuals for a probeset.
>
> >> I think you have (at least) 3 choices:
>
> >> 1. (preferred, i would think) you could remove all affected *probes*
> >> (via the creation of a SNP-affected-probe-free CDF) in advance, then
> >> run FIRMA as normal.  I can help with this if you tell me which  
> >> probes
> >> are affected.
>
> >> 2. remove the affected *probesets* afterwards, since you may not
> >> believe the FIRMA scores for which these are based on.
>
> >> 3. as you suggested, only calculate FIRMA scores from unaffected
> >> residuals.  But, the information you require to do this is the same
> >> information required to do #1 and it would seems like #1 is  
> >> preferred.
>
> >> The good thing about option #1 is you would still have some ability  
> >> to
> >> detect differential splicing for the probeset (instead of tossing it
> >> away), albeit with the smaller number of remaining unaffected probes.
>
> >> Cheers,
> >> Mark
>
> >>> Sabrina
>
> >>> On Apr 30, 3:46 am, Mark Robinson <mrobin...@wehi.edu.au> wrote:
> >>>> Hi Sabrina.
>
> >>>> I have not had to deal with this myself, but I do know that it  
> >>>> exists
> >>>> and I can at least suggest a possible route to exclude affected
> >>>> exons.
>
> >>>> Presumably, there is a database (dbSNP?) that tells you the genome
> >>>> locations of each SNP for your strains.  There is also a probe.tab
> >>>> file from Affymetrix that gives you the mapped genome locations of
> >>>> each probe (or you could take the sequences from the same file and
> >>>> map
> >>>> them yourself with a tool like BLAT).  It is then just a matter of
> >>>> looking whether each probe maps to a location on the genome that
> >>>> overlaps a SNP.  There is probably a Bioconductor tool for this or
> >>>> you
> >>>> could create a hash, etc.
>
> >>>> There are a couple levels at which you might introduce this to your
> >>>> analysis.  You could remove individual probes that are affected.  
> >>>> On
> >>>> the aroma.affymetrix side, this would require creating a new CDF  
> >>>> with
> >>>> those affected probes not included (a bit tricky but doable).  Or,
> >>>> you
> >>>> could simply post-process your existing results and remove  
> >>>> probesets
> >>>> that have an affected probe (easier but not as elegant).
>
> >>>> You might've also seen:
>
> >>>> Duan S, Zhang W, Bleibel WK, Cox NJ, Dolan ME: SNPinProbe 1.0: A
> >>>> database for filtering out
> >>>> probes in the Affymetrix GeneChip(R) HumanExon1.0 ST array
> >>>> potentially affected bySNPs.
> >>>> Bioinformation 2008, 2(10):469{470.
>
> >>>> Hope that gets you started.
>
> >>>> Cheers,
> >>>> Mark
>
> >>>> On 30/04/2009, at 6:07 AM, sabrina wrote:
>
> >>>>> Hi, all:
> >>>>> I am using Aroma for detectingexonskipping events around two  
> >>>>> groups
> >>>>> (two different strains). I found out that several of my top hits
> >>>>> indeed includes at least one SNP between two strains. I wonder if
> >>>>> anyone has some suggestion about how to deal with this situation.
> >>>>> If I
> >>>>> need to remove all affected exons from analysis, how can I do  
> >>>>> it? I
> >>>>> never worked with SNP data before, can anyone give me a hint?
> >>>>> Thanks a
> >>>>> lot!
>
> >>>>> Sabrina
>
> >>>> ------------------------------
> >>>> Mark Robinson
> >>>> Epigenetics Laboratory, Garvan
> >>>> Bioinformatics Division, WEHI
> >>>> e: m.robin...@garvan.org.au
> >>>> e: mrobin...@wehi.edu.au
> >>>> p: +61 (0)3 9345 2628
> >>>> f: +61 (0)3 9347 0852
> >>>> ------------------------------
>
> >> ------------------------------
> >> Mark Robinson, PhD (Melb)
> >> Epigenetics Laboratory, Garvan
> >> Bioinformatics Division, WEHI
> >> e: m.robin...@garvan.org.au
> >> e: mrobin...@wehi.edu.au
> >> p: +61 (0)3 9345 2628
> >> f: +61 (0)3 9347 0852
> >> ------------------------------
>
> ------------------------------
> Mark Robinson, PhD (Melb)
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: m.robin...@garvan.org.au
> e: mrobin...@wehi.edu.au
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
> ------------------------------
--~--~---------~--~----~------------~-------~--~----~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to