[Bioc-devel] thanks
Just a note to say thanks to those who worked on (1) the new biocViews search capabilities and (2) seqlevelsStyle<-. These are great improvements that have made tasks easier / faster time and time again. Yea! Val ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] need for consistent coordinate mapping API
Hi, It's an interesting problem. Right now mapCoords() has some limitations. For example I can use it to map from reference sequence to read cycle but not the other way around. Or from reference genome to transcriptome but not the other way around (this reverse mapping is actually what low level util transcriptLocs2RefLocs() does). Ideally we should be able to easily go back and forth between 2 coordinate spaces. Also we need to think about how more complex use cases (like Laurent's one) could be handled by mapCoords(). Not clear. We might need to change mapCoords's design a little bit, or maybe we'll need something else. H. On 09/19/2014 10:40 AM, Laurent Gatto wrote: On 19 September 2014 18:07, Michael Lawrence wrote: Hi guys, This is the problem of mapping back and forth between coordinate spaces, such as between genomic and transcript space. I think there was some progress this release cycle (introduction of mapCoords generic, etc), but I think there is yet more to do. For example, transcriptLocs2RefLocs could be given a ranges-based wrapper that conforms to the mapCoords API somehow. Could we please put this on the TODO list of someone (in Seattle) for the next release cycle? And I would be very interested in (and slowly working towards) generalising this to proteomics data. For now, there is a rather long description in [1], but eventually, it should be standardised. I was not aware of mapCoords and will read about it. Laurent [1] http://bioconductor.org/packages/devel/bioc/vignettes/Pbase/inst/doc/mapping.html Michael [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax:(206) 667-1319 ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] need for consistent coordinate mapping API
On 19 September 2014 18:07, Michael Lawrence wrote: > Hi guys, > > This is the problem of mapping back and forth between coordinate spaces, > such as between genomic and transcript space. I think there was some > progress this release cycle (introduction of mapCoords generic, etc), but I > think there is yet more to do. For example, transcriptLocs2RefLocs could be > given a ranges-based wrapper that conforms to the mapCoords API somehow. > Could we please put this on the TODO list of someone (in Seattle) for the > next release cycle? And I would be very interested in (and slowly working towards) generalising this to proteomics data. For now, there is a rather long description in [1], but eventually, it should be standardised. I was not aware of mapCoords and will read about it. Laurent [1] http://bioconductor.org/packages/devel/bioc/vignettes/Pbase/inst/doc/mapping.html > Michael > > [[alternative HTML version deleted]] > > ___ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] need for consistent coordinate mapping API
Hi guys, This is the problem of mapping back and forth between coordinate spaces, such as between genomic and transcript space. I think there was some progress this release cycle (introduction of mapCoords generic, etc), but I think there is yet more to do. For example, transcriptLocs2RefLocs could be given a ranges-based wrapper that conforms to the mapCoords API somehow. Could we please put this on the TODO list of someone (in Seattle) for the next release cycle? Michael [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] GenomicRanges::findOverlaps() ignoring chromosome information?
Hi all, for this concluding email ! I found the problem in my code: Everything was in the right place, except that I initialised the column meant to store the chromosome name with NA values (DMRs without hits will be left with this NA if the users requires all DMRs in the return value). When I subsquently inserted the chromosome name for the DMRs hitting an annotated gene, the character value was then converted in a numeric value because a column initialised with NA is of class "logical". This is where the actual chromosome name was converted to a numeric value, often different from the original chromosome name. When I subsequently prefixed that value with "chr", converting that column to the class character, there was no trace of the undesired conversion left. Anyway, for those interested, I attach the two functions I wrote (and corrected): - OverlapDmrs.Gene - Takes the output data.frame "dmrs" from bsseq, a GRanges object obtained form a UCSC gene track, and some opotional arguments - To find DMRs overlapping annotated genes, and return a table with the coordinates and Ensembl identifier of that gene - OverlapDmrs.Cpg - Same as above, except expects a GRanges object from a UCSC cpg track - Annotates with the coordinates of an overlapping CpG island I also attached a example data.frame of dmrs obtained using bsseq, as described in my first email. I believe all the code is there to test. Feel free to give me feedback on this. Apologies for the spam and the relatively obvious mistake on my part. Cheers Kevin On 19 September 2014 12:21, Kevin Rue-Albrecht wrote: > Hi again, > > Update on my issue, although I haven't found the source of the error yet.. > I have correct overlaps in one scenario, but not in another. This suggests > that the findOverlaps() command works as expected on my data, but in the > second scenario I don't see where the error is yet, let me explain: > >- When I use my function OverlapDmrs.Gene with argument only.hits=TRUE, >all the hits make perfect sense > - Full command: dmrs_gene = OverlapDmrs.Gene(dmrs=dmrs, > gene_track=ensGene.asFeatures, only.hits=TRUE, prefix.chr=TRUE) >- When I use my function OverlapDmrs.Gene with argument only.hits=FALSE, >the correct DMRs are annotated with the right start and stop position, but >with an incorrect chromosome value (strangest thing is that chromosone 30 >should not exist in *Bos taurus*, while some hits state this value in >the chromosome column) > - Full command: dmrs_gene.all = OverlapDmrs.Gene(dmrs=dmrs, > gene_track=ensGene.asFeatures, only.hits=FALSE, prefix.chr=T) > > > ... > Now that I wrote that "out loud", I just got an idea where to look for the > source of the problem. Apologies for the spam, but if I find the solution, > I'll definitely bring a conclusion to this thread. > > Kevin > > > > > > > On 19 September 2014 10:12, Kevin Rue-Albrecht > wrote: > >> Dear maintainer, Dear all, >> >> *Situation* >> I have used the findOverlaps(function) to annotate differentially >> methylated regions (DRMs) obtained using the bsseq Bioconductor package in >> the *Bos taurus* genome. (No, you won't steal my experimental design :-P >> ). >> I used the genome UMD3.1.75 as a reference for my analysis. >> >> *Problem* >> The genes found to overlap the DMRs genomic ranges are often on a >> different chromosone than the DMR, although the start and end coordinate of >> DMRs and gene do overlap in all cases. >> This leads me to believe that the chromosome information is ignored in >> findOverlaps(). Is this the case, or am I using the function incorrectly? >> Note that it does happen that a "true hit" is returned, i.e. the >> overlapping gene is present on the same chromosome, with start and end >> overlapping the coordinates of the DMR. >> >> >> *Attached for your use/testing:* >> >>- dmrs variable >>- script used to annotate dmrs with information about overlapping gene >> - Note that I have tried to set select to arbitrary, first and >> last with always the same issue. I would prefer to get a single hit at >> this >> stage rather than filter afterwards, but the latter remain a possible >> option if necessary. >> >> >> Any help / solution / feedback welcome ! >> >> Best regards, >> Kevin >> >> -- >> Kévin RUE-ALBRECHT >> Wellcome Trust Computational Infection Biology PhD Programme >> University College Dublin >> Ireland >> http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en >> > > > > -- > Kévin RUE-ALBRECHT > Wellcome Trust Computational Infection Biology PhD Programme > University College Dublin > Ireland > http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en > -- Kévin RUE-ALBRECHT Wellcome Trust Computational Infection Biology PhD Programme University College Dublin Ireland http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en ___ Bioc-devel@r-project.org mai
Re: [Bioc-devel] GenomicRanges::findOverlaps() ignoring chromosome information?
Hi again, Update on my issue, although I haven't found the source of the error yet.. I have correct overlaps in one scenario, but not in another. This suggests that the findOverlaps() command works as expected on my data, but in the second scenario I don't see where the error is yet, let me explain: - When I use my function OverlapDmrs.Gene with argument only.hits=TRUE, all the hits make perfect sense - Full command: dmrs_gene = OverlapDmrs.Gene(dmrs=dmrs, gene_track=ensGene.asFeatures, only.hits=TRUE, prefix.chr=TRUE) - When I use my function OverlapDmrs.Gene with argument only.hits=FALSE, the correct DMRs are annotated with the right start and stop position, but with an incorrect chromosome value (strangest thing is that chromosone 30 should not exist in *Bos taurus*, while some hits state this value in the chromosome column) - Full command: dmrs_gene.all = OverlapDmrs.Gene(dmrs=dmrs, gene_track=ensGene.asFeatures, only.hits=FALSE, prefix.chr=T) ... Now that I wrote that "out loud", I just got an idea where to look for the source of the problem. Apologies for the spam, but if I find the solution, I'll definitely bring a conclusion to this thread. Kevin On 19 September 2014 10:12, Kevin Rue-Albrecht wrote: > Dear maintainer, Dear all, > > *Situation* > I have used the findOverlaps(function) to annotate differentially > methylated regions (DRMs) obtained using the bsseq Bioconductor package in > the *Bos taurus* genome. (No, you won't steal my experimental design :-P > ). > I used the genome UMD3.1.75 as a reference for my analysis. > > *Problem* > The genes found to overlap the DMRs genomic ranges are often on a > different chromosone than the DMR, although the start and end coordinate of > DMRs and gene do overlap in all cases. > This leads me to believe that the chromosome information is ignored in > findOverlaps(). Is this the case, or am I using the function incorrectly? > Note that it does happen that a "true hit" is returned, i.e. the > overlapping gene is present on the same chromosome, with start and end > overlapping the coordinates of the DMR. > > > *Attached for your use/testing:* > >- dmrs variable >- script used to annotate dmrs with information about overlapping gene > - Note that I have tried to set select to arbitrary, first and last > with always the same issue. I would prefer to get a single hit at this > stage rather than filter afterwards, but the latter remain a possible > option if necessary. > > > Any help / solution / feedback welcome ! > > Best regards, > Kevin > > -- > Kévin RUE-ALBRECHT > Wellcome Trust Computational Infection Biology PhD Programme > University College Dublin > Ireland > http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en > -- Kévin RUE-ALBRECHT Wellcome Trust Computational Infection Biology PhD Programme University College Dublin Ireland http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] GenomicRanges::findOverlaps() ignoring chromosome information?
Dear maintainer, Dear all, *Situation* I have used the findOverlaps(function) to annotate differentially methylated regions (DRMs) obtained using the bsseq Bioconductor package in the *Bos taurus* genome. (No, you won't steal my experimental design :-P ). I used the genome UMD3.1.75 as a reference for my analysis. *Problem* The genes found to overlap the DMRs genomic ranges are often on a different chromosone than the DMR, although the start and end coordinate of DMRs and gene do overlap in all cases. This leads me to believe that the chromosome information is ignored in findOverlaps(). Is this the case, or am I using the function incorrectly? Note that it does happen that a "true hit" is returned, i.e. the overlapping gene is present on the same chromosome, with start and end overlapping the coordinates of the DMR. *Attached for your use/testing:* - dmrs variable - script used to annotate dmrs with information about overlapping gene - Note that I have tried to set select to arbitrary, first and last with always the same issue. I would prefer to get a single hit at this stage rather than filter afterwards, but the latter remain a possible option if necessary. Any help / solution / feedback welcome ! Best regards, Kevin -- Kévin RUE-ALBRECHT Wellcome Trust Computational Infection Biology PhD Programme University College Dublin Ireland http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel