Hi, Mark: On Nov 17, 11:46 pm, Mark Robinson <mrobin...@wehi.edu.au> wrote: > Hi Sabrina. > > Comments below. > > > Hi, Mark: > > Thanks for the information. What I worried about using coordinate is > > that coordinate changes with assembly, while sequences do not change. > > I don't know how many psrs are out of exon boundary, I will look into > > it. But here is an example: > > group Number: 5092691: > > CTTATCGAGATAAAAAGTGCTTCTGTGGGTCAATCTAGATATTGATAGATTTGGACTGGAGAAG > > > when I used blat from ensembl, it showed that it is out of exon > > boundary. > > I guess I mean: do you have an example of a (core) probe that maps to > a location not within an bona fide exon? What's above is certainly > not an Affymetrix 25-mer probe, so I'm a bit confused.
This is the sequence of the psr or from the unitGroup. The file I used was downloaded from Affy website. According to the website: Probe set sequences consist of the contiguous genomic sequence starting at the beginning of the first probe and ending at the end of the last probe in the set as they are aligned to the genome. They are provided in the orientation they exist in the mRNA in 5'->3' direction. so my hypothesis is that if all of the probes in psr are in the exon region, then this psr sequence should be in it as well, right? > > > Forgot to ask another question. Back to my original question, is there > > an (easy) way to map from partial sequence (i.e. the probeset > > sequence) to exon sequence in a batch mode or in R? Thanks! > > This is not really an aroma.affymetrix question and there are various > answers to this on the Bioconductor mailing list. Personally, I > usually use the findOverlaps() function in IRanges package. Its super > quick. Here is an rough sketch of an example (please don't take this > and assume it work work for you ... this is just to direct you toward > a way to do it): I will look into it! Thanks a lot :) Sabrina > > library(IRanges) > > # assume you have 3 corresponding vectors for the exons: > # 'exonStart', 'exonEnd', 'exonChr' > exonIranges <- mapply(IRanges, start = split(exonStart, exonChr), > end = split(exonEnd, exonChr)) > exonL <- do.call(RangesList, exonIranges) > > # similarly, corresponding vectors for the 25-mer Affy probes > # 'sp' is position, 'ch' is chromosome > probeRanges <- mapply(IRanges, start = split(sp, ch), > end = split(sp+24, ch)) > probeL <- do.call(RangesList, probeRanges) > > # you may wish to make sure that these lists cover the same chromosomes > fo <- findOverlaps(probeL, exonL) > > I'll leave it as an exercise to the reader to unwrap the contents of > the 'fo' object [Hint: see the as.table() method]. In your case, > you'd be interested to know how many probes do not "overlap" within > exon boundaries and I'd guess that you'll be careful what set of > probes you use (e.g. core only?). > > Hope that helps. > > Cheers, > Mark > > > Of course, if I made a mistake when I compiled the CDF, then that is > > different story. I hope not. > > > Any suggestions? Thanks! > > > On Nov 16, 8:10 pm, Mark Robinson <mrobin...@wehi.edu.au> wrote: > >> Hi folks. > > >> Note that you can download these directly with the R/Bioconductor > >> package 'rtracklayer'. For example: > > >> library(rtracklayer) > >> session <- browserSession("UCSC") > >> q1 <- ucscTableQuery(session, "refGene", GenomicRanges(genome = > >> "hg18")) > >> refGene <- getTable(q1) > > >> Sabrina: I'm actually surprised that many probes lie outside exon > >> boundaries. They were specifically designed to be inside. Of > >> course, > >> the array was designed on annotation from a few years ago, but > >> still I > >> would expect this to be minimal. Can you give some numbers on this? > >> Or, some examples. > > >> Cheers, > >> Mark > > >> On 17-Nov-09, at 3:41 AM, camelbbs wrote: > > >>>http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/ > >>> refFlat.txt.gz and refGene.txt.gz are the similar. > > >>> On Nov 15, 5:02 pm, sabrina <sabrina.s...@gmail.com> wrote: > >>>> Hi, Jiang: > >>>> Can you give me more detail about UCSC RefGene file? Is there a > >>>> link > >>>> to download? Thanks! > > >>>> Sabrina > > >>>> On Nov 15, 11:38 am, camelbbs <camel...@gmail.com> wrote: > > >>>>> hi, > >>>>> I think you can get the probeset coordinates from affy exon array > >>>>> annotation files, and you can get exon coordinates from the UCSC > >>>>> RefGene files. Then you can compare them. > >>>>> Best, > >>>>> Jiang > > >>>>> On Nov 15, 6:56 am, sabrina <sabrina.s...@gmail.com> wrote: > > >>>>>> Hello, all: > >>>>>> I used FIRMA to find potential spliced genes. But as we know, the > >>>>>> probesets from affy exon array could be out of exon boundary or > >>>>>> just > >>>>>> cover part of exons. I wonder if I have the sequence of the > >>>>>> probeset > >>>>>> (which I get from Affy website), how do I do it in batch to find > >>>>>> whether it is in an exon (ENSEMBL) region , or if it is, how do I > >>>>>> get > >>>>>> the entire exon sequence and coordinates? Thanks a lot! > > >>>>>> Sabrina > > >>> -- > >>> When reporting problems on aroma.affymetrix, make sure 1) to run the > >>> latest version of the package, 2) to report the output of > >>> sessionInfo() and traceback(), and 3) to post a complete code > >>> example. > > >>> You received this message because you are subscribed to the Google > >>> Groups "aroma.affymetrix" group. > >>> To post to this group, send email to aroma-affymetrix@googlegroups.com > >>> To unsubscribe from this group, send email to > >>> aroma-affymetrix-unsubscr...@googlegroups.com > >>> For more options, visit this group > >>> athttp://groups.google.com/group/aroma-affymetrix?hl=en > > >> ------------------------------ > >> Mark Robinson, PhD (Melb) > >> Epigenetics Laboratory, Garvan > >> Bioinformatics Division, WEHI > >> e: m.robin...@garvan.org.au > >> e: mrobin...@wehi.edu.au > >> p: +61 (0)3 9345 2628 > >> f: +61 (0)3 9347 0852 > >> ------------------------------ > > > -- > > When reporting problems on aroma.affymetrix, make sure 1) to run the > > latest version of the package, 2) to report the output of > > sessionInfo() and traceback(), and 3) to post a complete code example. > > > You received this message because you are subscribed to the Google > > Groups "aroma.affymetrix" group. > > To post to this group, send email to aroma-affymetrix@googlegroups.com > > To unsubscribe from this group, send email to > > aroma-affymetrix-unsubscr...@googlegroups.com > > For more options, visit this group > > athttp://groups.google.com/group/aroma-affymetrix?hl=en > > ------------------------------ > Mark Robinson, PhD (Melb) > Epigenetics Laboratory, Garvan > Bioinformatics Division, WEHI > e: m.robin...@garvan.org.au > e: mrobin...@wehi.edu.au > p: +61 (0)3 9345 2628 > f: +61 (0)3 9347 0852 > ------------------------------ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe from this group, send email to aroma-affymetrix-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/aroma-affymetrix?hl=en