Hi, Mark I tried to run flat2Cdf, it ran towards end but told me that the memory was not enough, I am running windows xp with 4G ram. But it did read file, split file into units etc. So I tried on Linux server, but it gave me read.table error (same code I used on my PC) >>>>>> Reading TXT file ...Error in read.table(file, header = TRUE, colClasses = col.class, stringsAsFactors = FALSE, : unused argument(s) (stringsAsFactors ...) Execution halted >>>>>>> Can you suggest possible solutions? Thanks
Sabrina On Jun 11, 9:41 am, Mark Robinson <mrobin...@wehi.edu.au> wrote: > Hi Sabrina. > > The Unit_ID can be any "transcript cluster" identifier of your > choice. The easiest may be to use the Affymetrix transcript cluster > identifier itself ... available from: > > http://www.affymetrix.com/analysis/downloads/current_exon/MoEx-1_0-st... > > See the 'transcript_cluster_id' column. Perhaps only take the "core" > probes, as defined in the the 'level' column? > > Note: we used Ensembl in that flat2Cdf() example since we were using a > custom organization (i.e. non-Affy) of the probesets. > > Cheers, > Mark > > On 11/06/2009, at 10:58 PM, sabrina wrote: > > > > > > > Hi, Mark: > > for the Unit_id, does it have to be Ensembl gene ID like ENSMUSGxxxx? > > Lots of genes do not have ensembl assignment from Affy annotation > > file. There are lots of missing annotaions, and I still have not found > > any good way to deal with it. Do you have any suggestions? > > > Thanks > > > Sabrina > > > On Jun 10, 12:32 am, Mark Robinson <mrobin...@wehi.edu.au> wrote: > >> Hi Sabrina. > > >> How about you try and create a 'flat' file like the one described > >> at:http://groups.google.com/group/aroma-affymetrix/web/creating-cdf-file > >> ... > > >> Presumably, you will be comfortable with the Exon Array's 'probetab' > >> file by now and possibly the Affymetrix annotation CSV file and so > >> you > >> should have access to all this information. > > >> For example, from the following table: > > >> mac1618:HuEx-1_0-st-v2.probe.tab mrobinson$ head HuEx-1_0-st- > >> v2.probe.tab > >> Probe ID Probe Set ID probe x probe y assembly > >> seqname start stop > >> strand probe sequence target strandedness category > >> 494998 2315101 917 193 build-34/hg16 chr1 1788 > >> 1812 + > >> CACGGGAAGTCTGGGCTAAGAGACA Sense main > >> 1734213 2315101 1092 677 build-34/hg16 chr1 1973 > >> 1997 + > >> ACAGGGGCCAGAAGATGAACAATGG Sense main > >> 4767517 2315101 796 1862 build-34/hg16 chr1 1992 > >> 2016 + > >> ATTAAGTTACATGCAGACAACAGGG Sense main > >> 4286427 2315101 986 1674 build-34/hg16 chr1 2006 > >> 2030 + > >> TGCCTGGTTGTGGTATTAAGTTACA Sense main > >> 5760145 2315102 144 2250 build-34/hg16 chr1 2520 > >> 2544 + > >> TCGGCCGTCGTCTTCTGCAGCTCTG Sense main > >> 671410 2315102 689 262 build-34/hg16 chr1 2523 > >> 2547 + > >> AAGTCGGCCGTCGTCTTCTGCAGCT Sense main > >> 4275780 2315102 579 1670 build-34/hg16 chr1 2526 > >> 2550 + > >> TCCAAGTCGGCCGTCGTCTTCTGCA Sense main > >> 4293462 2315102 341 1677 build-34/hg16 chr1 2531 > >> 2555 + > >> TGTGATCCAAGTCGGCCGTCGTCTT Sense main > >> 5388 2315103 267 2 build-34/hg16 chr1 2927 > >> 2951 + > >> CTGTCTGTCGACCCAGCTGGAGGCA Sense main > >> [snip] > > >> ... you see the second column is the probeset_id, which would be used > >> as the "Group_ID" column for your flat file. Depending on whether > >> you > >> are using the Ensembl CDF or the Affymetrix annotation, you would > >> need > >> to create a mapping to get the transcript cluster id column (here, > >> the > >> "Unit_ID"). Everything else you need (Probe_Sequence, X, Y, > >> Probe_ID) > >> is within the table above. > > >> Then, it would be just a matter of filtering OUT those probes that > >> overlap a SNP, which based on your mapping exercise, you must have a > >> list of. Then, make a call to the flat2Cdf() script and hopefully > >> you'll be off and running. > > >> Let me know how you go. > > >> Cheers, > >> Mark > > >> On 10/06/2009, at 1:00 PM, sabrina wrote: > > >>> Thanks , Mark! > >>> Can you show me /walk me through how to get a new snp-free CDF ? I > >>> finally got the right version of snp and probe mapping so I am ready > >>> to try it out! > > >>> Sabrina > > >>> On Jun 6, 3:14 am, Mark Robinson <mrobin...@wehi.edu.au> wrote: > >>>> Hi Sabrina. > > >>>> Comments below. > > >>>> On 06/06/2009, at 1:57 AM, sabrina wrote: > > >>>>> Hi, Mark: > >>>>> I finally found the SNP data set that is suitable for my case. > >>>>> As I > >>>>> understand, aroma used RMA to estimate gene level and exon level > >>>>> intensities. After I estimate gene level (transcript level), I can > >>>>> use > >>>>> FIRMA to estimate residual for each exon and compose a score as > >>>>> described in the paper . My question is: if there is a SNP > >>>>> difference > >>>>> between two strains within one exon, should I exclude that exon > >>>>> from > >>>>> estimating transcript level value? My guess is probably no. > > >>>> If the SNP affects only 1 probe in an entire transcript, I would > >>>> expect it to have very little impact on the gene-level summary. > >>>> And, > >>>> especially so if there are a large number of total probes for that > >>>> gene. It may have a noticeable effect on the probe effect. > > >>>>> So will it > >>>>> be a good idea if I exclude that exon after I calculate all FIRMA > >>>>> scores or should I exclude these exons after I estimate > >>>>> residuals , > >>>>> but only used these residuals not affected by SNPs for firma score > >>>>> estimation? Thanks > > >>>> Keep in mind the residuals are calculated at the probe-level, not > >>>> the > >>>> probeset-level. The FIRMA score is then a summary of the all the > >>>> residuals for a probeset. > > >>>> I think you have (at least) 3 choices: > > >>>> 1. (preferred, i would think) you could remove all affected > >>>> *probes* > >>>> (via the creation of a SNP-affected-probe-free CDF) in advance, > >>>> then > >>>> run FIRMA as normal. I can help with this if you tell me which > >>>> probes > >>>> are affected. > > >>>> 2. remove the affected *probesets* afterwards, since you may not > >>>> believe the FIRMA scores for which these are based on. > > >>>> 3. as you suggested, only calculate FIRMA scores from unaffected > >>>> residuals. But, the information you require to do this is the same > >>>> information required to do #1 and it would seems like #1 is > >>>> preferred. > > >>>> The good thing about option #1 is you would still have some ability > >>>> to > >>>> detect differential splicing for the probeset (instead of tossing > >>>> it > >>>> away), albeit with the smaller number of remaining unaffected > >>>> probes. > > >>>> Cheers, > >>>> Mark > > >>>>> Sabrina > > >>>>> On Apr 30, 3:46 am, Mark Robinson <mrobin...@wehi.edu.au> wrote: > >>>>>> Hi Sabrina. > > >>>>>> I have not had to deal with this myself, but I do know that it > >>>>>> exists > >>>>>> and I can at least suggest a possible route to exclude affected > >>>>>> exons. > > >>>>>> Presumably, there is a database (dbSNP?) that tells you the > >>>>>> genome > >>>>>> locations of each SNP for your strains. There is also a > >>>>>> probe.tab > >>>>>> file from Affymetrix that gives you the mapped genome locations > >>>>>> of > >>>>>> each probe (or you could take the sequences from the same file > >>>>>> and > >>>>>> map > >>>>>> them yourself with a tool like BLAT). It is then just a matter > >>>>>> of > >>>>>> looking whether each probe maps to a location on the genome that > >>>>>> overlaps a SNP. There is probably a Bioconductor tool for this > >>>>>> or > >>>>>> you > >>>>>> could create a hash, etc. > > >>>>>> There are a couple levels at which you might introduce this to > >>>>>> your > >>>>>> analysis. You could remove individual probes that are affected. > >>>>>> On > >>>>>> the aroma.affymetrix side, this would require creating a new CDF > >>>>>> with > >>>>>> those affected probes not included (a bit tricky but doable). > >>>>>> Or, > >>>>>> you > >>>>>> could simply post-process your existing results and remove > >>>>>> probesets > >>>>>> that have an affected probe (easier but not as elegant). > > >>>>>> You might've also seen: > > >>>>>> Duan S, Zhang W, Bleibel WK, Cox NJ, Dolan ME: SNPinProbe 1.0: A > >>>>>> database for filtering out > >>>>>> probes in the Affymetrix GeneChip(R) HumanExon1.0 ST array > >>>>>> potentially affected bySNPs. > >>>>>> Bioinformation 2008, 2(10):469{470. > > >>>>>> Hope that gets you started. > > >>>>>> Cheers, > >>>>>> Mark > > >>>>>> On 30/04/2009, at 6:07 AM, sabrina wrote: > > >>>>>>> Hi, all: > >>>>>>> I am using Aroma for detectingexonskipping events around two > >>>>>>> groups > >>>>>>> (two different strains). I found out that several of my top hits > >>>>>>> indeed includes at least one SNP between two strains. I wonder > >>>>>>> if > >>>>>>> anyone has some suggestion about how to deal with this > >>>>>>> situation. > >>>>>>> If I > >>>>>>> need to remove all affected exons from analysis, how can I do > >>>>>>> it? I > >>>>>>> never worked with SNP data before, can anyone give me a hint? > >>>>>>> Thanks a > >>>>>>> lot! > > >>>>>>> Sabrina > > >>>>>> ------------------------------ > >>>>>> Mark Robinson > >>>>>> Epigenetics Laboratory, Garvan > >>>>>> Bioinformatics Division, WEHI > >>>>>> e: m.robin...@garvan.org.au > >>>>>> e: mrobin...@wehi.edu.au > >>>>>> p: +61 (0)3 9345 2628 > >>>>>> f: +61 (0)3 9347 0852 > >>>>>> ------------------------------ > > >>>> ------------------------------ > >>>> Mark Robinson, PhD (Melb) > >>>> Epigenetics Laboratory, Garvan > >>>> Bioinformatics Division, WEHI > >>>> e: m.robin...@garvan.org.au > >>>> e: mrobin...@wehi.edu.au > >>>> p: +61 (0)3 9345 2628 > >>>> f: +61 (0)3 9347 0852 > >>>> ------------------------------ > > >> ------------------------------ > >> Mark Robinson, PhD (Melb) > >> Epigenetics Laboratory, Garvan > >> Bioinformatics Division, WEHI > >> e: m.robin...@garvan.org.au > >> e: mrobin...@wehi.edu.au > >> p: +61 (0)3 9345 2628 > >> f: +61 (0)3 9347 0852 > >> ------------------------------ > > ------------------------------ > Mark Robinson, PhD (Melb) > Epigenetics Laboratory, Garvan > Bioinformatics Division, WEHI > e: m.robin...@garvan.org.au > e: mrobin...@wehi.edu.au > p: +61 (0)3 9345 2628 > f: +61 (0)3 9347 0852 > ------------------------------ --~--~---------~--~----~------------~-------~--~----~ When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en -~----------~----~----~----~------~----~------~--~---