[aroma.affymetrix] Re: SNPs affecting EXon splicing detection

sabrina Mon, 22 Jun 2009 12:30:54 -0700

Hi, Mark
I tried to run flat2Cdf, it ran towards end but told me that the
memory  was not enough, I am running windows xp with 4G ram. But it
did read file, split file into units etc. So I tried on Linux server,
but it gave me read.table error (same code I used on my PC)
>>>>>>
Reading TXT file ...Error in read.table(file, header = TRUE,
colClasses = col.class, stringsAsFactors = FALSE,  :
        unused argument(s) (stringsAsFactors ...)
Execution halted
>>>>>>>
Can you suggest possible solutions? Thanks


Sabrina

On Jun 11, 9:41 am, Mark Robinson <mrobin...@wehi.edu.au> wrote:
> Hi Sabrina.
>
> The Unit_ID can be any "transcript cluster" identifier of your  
> choice.  The easiest may be to use the Affymetrix transcript cluster  
> identifier itself ... available from:
>
> http://www.affymetrix.com/analysis/downloads/current_exon/MoEx-1_0-st...
>
> See the 'transcript_cluster_id' column.  Perhaps only take the "core"  
> probes, as defined in the the 'level' column?
>
> Note: we used Ensembl in that flat2Cdf() example since we were using a  
> custom organization (i.e. non-Affy) of the probesets.
>
> Cheers,
> Mark
>
> On 11/06/2009, at 10:58 PM, sabrina wrote:
>
>
>
>
>
> > Hi, Mark:
> > for the Unit_id, does it have to be Ensembl gene ID like ENSMUSGxxxx?
> > Lots of genes do not have ensembl assignment from Affy annotation
> > file. There are lots of missing annotaions, and I still have not found
> > any good way to deal with it. Do you have any suggestions?
>
> > Thanks
>
> > Sabrina
>
> > On Jun 10, 12:32 am, Mark Robinson <mrobin...@wehi.edu.au> wrote:
> >> Hi Sabrina.
>
> >> How about you try and create a 'flat' file like the one described  
> >> at:http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-file
> >> ...
>
> >> Presumably, you will be comfortable with the Exon Array's 'probetab'
> >> file by now and possibly the Affymetrix annotation CSV file and so  
> >> you
> >> should have access to all this information.
>
> >> For example, from the following table:
>
> >> mac1618:HuEx-1_0-st-v2.probe.tab mrobinson$ head HuEx-1_0-st-
> >> v2.probe.tab
> >> Probe ID        Probe Set ID    probe x probe y assembly        
> >> seqname start   stop
> >> strand  probe sequence  target strandedness     category
> >> 494998  2315101 917     193     build-34/hg16   chr1    1788    
> >> 1812    +
> >> CACGGGAAGTCTGGGCTAAGAGACA       Sense   main
> >> 1734213 2315101 1092    677     build-34/hg16   chr1    1973    
> >> 1997    +
> >> ACAGGGGCCAGAAGATGAACAATGG       Sense   main
> >> 4767517 2315101 796     1862    build-34/hg16   chr1    1992    
> >> 2016    +
> >> ATTAAGTTACATGCAGACAACAGGG       Sense   main
> >> 4286427 2315101 986     1674    build-34/hg16   chr1    2006    
> >> 2030    +
> >> TGCCTGGTTGTGGTATTAAGTTACA       Sense   main
> >> 5760145 2315102 144     2250    build-34/hg16   chr1    2520    
> >> 2544    +
> >> TCGGCCGTCGTCTTCTGCAGCTCTG       Sense   main
> >> 671410  2315102 689     262     build-34/hg16   chr1    2523    
> >> 2547    +
> >> AAGTCGGCCGTCGTCTTCTGCAGCT       Sense   main
> >> 4275780 2315102 579     1670    build-34/hg16   chr1    2526    
> >> 2550    +
> >> TCCAAGTCGGCCGTCGTCTTCTGCA       Sense   main
> >> 4293462 2315102 341     1677    build-34/hg16   chr1    2531    
> >> 2555    +
> >> TGTGATCCAAGTCGGCCGTCGTCTT       Sense   main
> >> 5388    2315103 267     2       build-34/hg16   chr1    2927    
> >> 2951    +
> >> CTGTCTGTCGACCCAGCTGGAGGCA       Sense   main
> >> [snip]
>
> >> ... you see the second column is the probeset_id, which would be used
> >> as the "Group_ID" column for your flat file.  Depending on whether  
> >> you
> >> are using the Ensembl CDF or the Affymetrix annotation, you would  
> >> need
> >> to create a mapping to get the transcript cluster id column (here,  
> >> the
> >> "Unit_ID").  Everything else you need (Probe_Sequence, X, Y,  
> >> Probe_ID)
> >> is within the table above.
>
> >> Then, it would be just a matter of filtering OUT those probes that
> >> overlap a SNP, which based on your mapping exercise, you must have a
> >> list of.  Then, make a call to the flat2Cdf() script and hopefully
> >> you'll be off and running.
>
> >> Let me know how you go.
>
> >> Cheers,
> >> Mark
>
> >> On 10/06/2009, at 1:00 PM, sabrina wrote:
>
> >>> Thanks , Mark!
> >>> Can you show me /walk me through how to get a new snp-free CDF ? I
> >>> finally got the right version of snp and probe mapping so I am ready
> >>> to try it out!
>
> >>> Sabrina
>
> >>> On Jun 6, 3:14 am, Mark Robinson <mrobin...@wehi.edu.au> wrote:
> >>>> Hi Sabrina.
>
> >>>> Comments below.
>
> >>>> On 06/06/2009, at 1:57 AM, sabrina wrote:
>
> >>>>> Hi, Mark:
> >>>>> I finally found the SNP data set that is suitable for my case.  
> >>>>> As I
> >>>>> understand, aroma used RMA to estimate gene level and exon level
> >>>>> intensities. After I estimate gene level (transcript level), I can
> >>>>> use
> >>>>> FIRMA to estimate residual for each exon and compose a score as
> >>>>> described in the paper . My question is: if there is a SNP
> >>>>> difference
> >>>>> between two strains within one exon, should I exclude that exon  
> >>>>> from
> >>>>> estimating transcript level value? My guess is probably no.
>
> >>>> If the SNP affects only 1 probe in an entire transcript, I would
> >>>> expect it to have very little impact on the gene-level summary.  
> >>>> And,
> >>>> especially so if there are a large number of total probes for that
> >>>> gene.  It may have a noticeable effect on the probe effect.
>
> >>>>> So will it
> >>>>> be a good idea if I exclude that exon after I calculate all FIRMA
> >>>>> scores or  should I exclude these exons after I estimate  
> >>>>> residuals ,
> >>>>> but only used these residuals not affected by SNPs for firma score
> >>>>> estimation? Thanks
>
> >>>> Keep in mind the residuals are calculated at the probe-level, not  
> >>>> the
> >>>> probeset-level.  The FIRMA score is then a summary of the all the
> >>>> residuals for a probeset.
>
> >>>> I think you have (at least) 3 choices:
>
> >>>> 1. (preferred, i would think) you could remove all affected  
> >>>> *probes*
> >>>> (via the creation of a SNP-affected-probe-free CDF) in advance,  
> >>>> then
> >>>> run FIRMA as normal.  I can help with this if you tell me which
> >>>> probes
> >>>> are affected.
>
> >>>> 2. remove the affected *probesets* afterwards, since you may not
> >>>> believe the FIRMA scores for which these are based on.
>
> >>>> 3. as you suggested, only calculate FIRMA scores from unaffected
> >>>> residuals.  But, the information you require to do this is the same
> >>>> information required to do #1 and it would seems like #1 is
> >>>> preferred.
>
> >>>> The good thing about option #1 is you would still have some ability
> >>>> to
> >>>> detect differential splicing for the probeset (instead of tossing  
> >>>> it
> >>>> away), albeit with the smaller number of remaining unaffected  
> >>>> probes.
>
> >>>> Cheers,
> >>>> Mark
>
> >>>>> Sabrina
>
> >>>>> On Apr 30, 3:46 am, Mark Robinson <mrobin...@wehi.edu.au> wrote:
> >>>>>> Hi Sabrina.
>
> >>>>>> I have not had to deal with this myself, but I do know that it
> >>>>>> exists
> >>>>>> and I can at least suggest a possible route to exclude affected
> >>>>>> exons.
>
> >>>>>> Presumably, there is a database (dbSNP?) that tells you the  
> >>>>>> genome
> >>>>>> locations of each SNP for your strains.  There is also a  
> >>>>>> probe.tab
> >>>>>> file from Affymetrix that gives you the mapped genome locations  
> >>>>>> of
> >>>>>> each probe (or you could take the sequences from the same file  
> >>>>>> and
> >>>>>> map
> >>>>>> them yourself with a tool like BLAT).  It is then just a matter  
> >>>>>> of
> >>>>>> looking whether each probe maps to a location on the genome that
> >>>>>> overlaps a SNP.  There is probably a Bioconductor tool for this  
> >>>>>> or
> >>>>>> you
> >>>>>> could create a hash, etc.
>
> >>>>>> There are a couple levels at which you might introduce this to  
> >>>>>> your
> >>>>>> analysis.  You could remove individual probes that are affected.
> >>>>>> On
> >>>>>> the aroma.affymetrix side, this would require creating a new CDF
> >>>>>> with
> >>>>>> those affected probes not included (a bit tricky but doable).  
> >>>>>> Or,
> >>>>>> you
> >>>>>> could simply post-process your existing results and remove
> >>>>>> probesets
> >>>>>> that have an affected probe (easier but not as elegant).
>
> >>>>>> You might've also seen:
>
> >>>>>> Duan S, Zhang W, Bleibel WK, Cox NJ, Dolan ME: SNPinProbe 1.0: A
> >>>>>> database for filtering out
> >>>>>> probes in the Affymetrix GeneChip(R) HumanExon1.0 ST array
> >>>>>> potentially affected bySNPs.
> >>>>>> Bioinformation 2008, 2(10):469{470.
>
> >>>>>> Hope that gets you started.
>
> >>>>>> Cheers,
> >>>>>> Mark
>
> >>>>>> On 30/04/2009, at 6:07 AM, sabrina wrote:
>
> >>>>>>> Hi, all:
> >>>>>>> I am using Aroma for detectingexonskipping events around two
> >>>>>>> groups
> >>>>>>> (two different strains). I found out that several of my top hits
> >>>>>>> indeed includes at least one SNP between two strains. I wonder  
> >>>>>>> if
> >>>>>>> anyone has some suggestion about how to deal with this  
> >>>>>>> situation.
> >>>>>>> If I
> >>>>>>> need to remove all affected exons from analysis, how can I do
> >>>>>>> it? I
> >>>>>>> never worked with SNP data before, can anyone give me a hint?
> >>>>>>> Thanks a
> >>>>>>> lot!
>
> >>>>>>> Sabrina
>
> >>>>>> ------------------------------
> >>>>>> Mark Robinson
> >>>>>> Epigenetics Laboratory, Garvan
> >>>>>> Bioinformatics Division, WEHI
> >>>>>> e: m.robin...@garvan.org.au
> >>>>>> e: mrobin...@wehi.edu.au
> >>>>>> p: +61 (0)3 9345 2628
> >>>>>> f: +61 (0)3 9347 0852
> >>>>>> ------------------------------
>
> >>>> ------------------------------
> >>>> Mark Robinson, PhD (Melb)
> >>>> Epigenetics Laboratory, Garvan
> >>>> Bioinformatics Division, WEHI
> >>>> e: m.robin...@garvan.org.au
> >>>> e: mrobin...@wehi.edu.au
> >>>> p: +61 (0)3 9345 2628
> >>>> f: +61 (0)3 9347 0852
> >>>> ------------------------------
>
> >> ------------------------------
> >> Mark Robinson, PhD (Melb)
> >> Epigenetics Laboratory, Garvan
> >> Bioinformatics Division, WEHI
> >> e: m.robin...@garvan.org.au
> >> e: mrobin...@wehi.edu.au
> >> p: +61 (0)3 9345 2628
> >> f: +61 (0)3 9347 0852
> >> ------------------------------
>
> ------------------------------
> Mark Robinson, PhD (Melb)
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: m.robin...@garvan.org.au
> e: mrobin...@wehi.edu.au
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
> ------------------------------
--~--~---------~--~----~------------~-------~--~----~
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe from this group, send email to 
aroma-affymetrix-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/aroma-affymetrix?hl=en
-~----------~----~----~----~------~----~------~--~---

[aroma.affymetrix] Re: SNPs affecting EXon splicing detection

Reply via email to