Re: [Bioc-devel] Confusing namespace issue with IRanges 1.99.17
Hi Leonardo -- On 07/07/2014 03:27 PM, Leonardo Collado Torres wrote: Hello BioC-devel list, I am currently confused on a namespace issue which I haven't been able to solve. To reproduce this, I made the simplest example I thought of. Step 1: make some toy data and save it on your desktop library(IRanges) DF - DataFrame(x = Rle(0, 10), y = Rle(1, 10)) save(DF, file=~/Desktop/DF.Rdata) Step 2: install the toy package on R 3.1.x library(devtools) install_github(lcolladotor/fooPkg) # Note that it passes R CMD check Step 3: on a new R session run example(foo, fooPkg) # Change the location of DF.Rdata if necessary You will see that when running the example, the session information is printed listing: other attached packages: [1] fooPkg_0.0.1 loaded via a namespace (and not attached): [1] BiocGenerics_0.11.3 IRanges_1.99.17 parallel_3.1.0 S4Vectors_0.1.0 stats4_3.1.0tools_3.1.0 Then the message for loading IRanges is showed, which is something I was not expecting and thus the following session info shows: other attached packages: [1] IRanges_1.99.17 S4Vectors_0.1.0 BiocGenerics_0.11.3 fooPkg_0.0.1 loaded via a namespace (and not attached): [1] stats4_3.1.0 tools_3.1.0 Meaning that IRanges, S4Vectors and BiocGenerics all went from loaded via a namespace to other attached packages. All the fooPkg::foo() is doing is using a mapply() to go through a DataFrame and a list of indices to subset the data as shown at https://github.com/lcolladotor/fooPkg/blob/master/R/foo.R#L26 That is: res - mapply(function(x, y) { x[y] }, DF, index) I thus thought that the only thing I would need to specify on the namespace is to import the '[' IRanges method. Checking with BiocCheck and codetoolsBioC suggests importing the method for mapply() from BiocGenerics. Doing so doesn't affect things and R still loads IRanges on that mapply() call. Importing the '[' method from S4Vectors doesn't help either. Most intriging, importing the whole S4Vectors, BiocGenerics and IRanges still doesn't change the fact that IRanges is loaded when evaluating the same line of code shown above. Any clues on what I am missing or doing wrong? This comes from S4Vectors::extractROWS selectMethod(extractROWS, c(Rle, integer)) Method Definition: function (x, i) { if (!suppressWarnings(require(IRanges, quietly = TRUE))) stop(Couldn't load the IRanges package. You need to install , the IRanges\n package in order to subset an Rle object.) ... which moves the IRanges package from loaded to attached. Maybe that should be 'suppressPackageStartupMessages' or if (!IRanges %in% loadedNamespaces()) and functions referenced by IRanges:::... In my use case, I'm trying to keep the namespace as small as possible (to minimize loading time) because it's for a tiny package that has a single function. This tiny package is then loaded on a BiocParallel::blapply() call using BiocParallel::SnowParam() which performs much better than BiocParallel::MulticoreParam() in terms of keeping the memory under control. probably it is not desirable to move packages from loaded to attached, but I don't think this influences performance in a meaningful way? Martin Thank you for your help! Leo Leonardo Collado Torres, PhD student Department of Biostatistics Johns Hopkins University Bloomberg School of Public Health Website: http://www.biostat.jhsph.edu/~lcollado/ Blog: http://lcolladotor.github.io/ Full output from running the example: example(foo, fooPkg) foo ## Initial info foo sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] fooPkg_0.0.1 loaded via a namespace (and not attached): [1] BiocGenerics_0.11.3 IRanges_1.99.17 parallel_3.1.0 S4Vectors_0.1.0 stats4_3.1.0tools_3.1.0 foo ## Load data foo load(~/Desktop/DF.Rdata) foo ## Run function foo result - foo(DF) R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] fooPkg_0.0.1 loaded via a namespace (and not attached): [1] BiocGenerics_0.11.3 IRanges_1.99.17 parallel_3.1.0 S4Vectors_0.1.0 stats4_3.1.0tools_3.1.0 Loading required package: parallel Attaching package: ‘BiocGenerics’ The following objects are masked from ‘package:parallel’: clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB The following object is masked from ‘package:stats’: xtabs The following objects are masked from ‘package:base’:
Re: [Bioc-devel] Confusing namespace issue with IRanges 1.99.17
Hi guys, On 07/08/2014 05:29 AM, Michael Lawrence wrote: This is why I tell people not to use require(). But what's with needing to load IRanges to subset an Rle? Is that temporary? Very temporary. The source code of the extractROWS and replaceROWS methods for Rle objects actually contains the following comment: ## FIXME: Right now, the subscript 'i' is turned into an IRanges ## object so we need stuff that lives in the IRanges package for this ## to work. This is ugly/hacky and needs to be fixed (thru a redesign ## of this method). if (!suppressWarnings(require(IRanges, quietly=TRUE))) stop(...) ... I introduced this hack last week when I moved the Rle code from IRanges to S4Vectors. It's temporary. The 2 methods need to be refactored which I'm planning to do this week. Cheers, H. Limiting imports is unlikely to reduce loading time. It may actually increase it. There are good reasons for it though. On Tue, Jul 8, 2014 at 5:21 AM, Martin Morgan mtmor...@fhcrc.org wrote: Hi Leonardo -- On 07/07/2014 03:27 PM, Leonardo Collado Torres wrote: Hello BioC-devel list, I am currently confused on a namespace issue which I haven't been able to solve. To reproduce this, I made the simplest example I thought of. Step 1: make some toy data and save it on your desktop library(IRanges) DF - DataFrame(x = Rle(0, 10), y = Rle(1, 10)) save(DF, file=~/Desktop/DF.Rdata) Step 2: install the toy package on R 3.1.x library(devtools) install_github(lcolladotor/fooPkg) # Note that it passes R CMD check Step 3: on a new R session run example(foo, fooPkg) # Change the location of DF.Rdata if necessary You will see that when running the example, the session information is printed listing: other attached packages: [1] fooPkg_0.0.1 loaded via a namespace (and not attached): [1] BiocGenerics_0.11.3 IRanges_1.99.17 parallel_3.1.0 S4Vectors_0.1.0 stats4_3.1.0tools_3.1.0 Then the message for loading IRanges is showed, which is something I was not expecting and thus the following session info shows: other attached packages: [1] IRanges_1.99.17 S4Vectors_0.1.0 BiocGenerics_0.11.3 fooPkg_0.0.1 loaded via a namespace (and not attached): [1] stats4_3.1.0 tools_3.1.0 Meaning that IRanges, S4Vectors and BiocGenerics all went from loaded via a namespace to other attached packages. All the fooPkg::foo() is doing is using a mapply() to go through a DataFrame and a list of indices to subset the data as shown at https://github.com/lcolladotor/fooPkg/blob/master/R/foo.R#L26 That is: res - mapply(function(x, y) { x[y] }, DF, index) I thus thought that the only thing I would need to specify on the namespace is to import the '[' IRanges method. Checking with BiocCheck and codetoolsBioC suggests importing the method for mapply() from BiocGenerics. Doing so doesn't affect things and R still loads IRanges on that mapply() call. Importing the '[' method from S4Vectors doesn't help either. Most intriging, importing the whole S4Vectors, BiocGenerics and IRanges still doesn't change the fact that IRanges is loaded when evaluating the same line of code shown above. Any clues on what I am missing or doing wrong? This comes from S4Vectors::extractROWS selectMethod(extractROWS, c(Rle, integer)) Method Definition: function (x, i) { if (!suppressWarnings(require(IRanges, quietly = TRUE))) stop(Couldn't load the IRanges package. You need to install , the IRanges\n package in order to subset an Rle object.) ... which moves the IRanges package from loaded to attached. Maybe that should be 'suppressPackageStartupMessages' or if (!IRanges %in% loadedNamespaces()) and functions referenced by IRanges:::... In my use case, I'm trying to keep the namespace as small as possible (to minimize loading time) because it's for a tiny package that has a single function. This tiny package is then loaded on a BiocParallel::blapply() call using BiocParallel::SnowParam() which performs much better than BiocParallel::MulticoreParam() in terms of keeping the memory under control. probably it is not desirable to move packages from loaded to attached, but I don't think this influences performance in a meaningful way? Martin Thank you for your help! Leo Leonardo Collado Torres, PhD student Department of Biostatistics Johns Hopkins University Bloomberg School of Public Health Website: http://www.biostat.jhsph.edu/~lcollado/ Blog: http://lcolladotor.github.io/ Full output from running the example: example(foo, fooPkg) foo ## Initial info foo sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] fooPkg_0.0.1 loaded via a namespace (and not attached): [1] BiocGenerics_0.11.3 IRanges_1.99.17
Re: [Bioc-devel] [BioC] granges() method for GenomicRanges objects akin to ranges()...
Hi, On 05/13/2014 01:15 AM, Julian Gehring wrote: Hi, In summary, would it be feasible to add to 'GenomicRanges'? 1) A 'granges(x, use.mcols=FALSE, ...)' method with signature 'GRanges' that converts to a 'GRanges' object and optionally drops the mcols (if 'use.mcols' is TRUE) Will do. 2) A 'dropMcols' or 'dropmcols' method with signature 'GRanges' that is a wrapper for mcols(x) - NULL How about setMcols(), which is more general than dropmcols()? Thanks, H. If I can be of help in providing a patch for this, please let me know. Best wishes Julian On 05.05.2014 23:29, Hervé Pagès wrote: On 05/05/2014 02:12 PM, Cook, Malcolm wrote: On 05/05/2014 01:00 PM, Cook, Malcolm wrote: Wondering, Is it too off the beaten track to expect `mcols-`(x,NULL) args(`mcols-`) function (x, ..., value) Arguments after the ellipsis must be named: `mcols-`(x, value=NULL) Herve - Great - of course - so - does this not provide the means requested by the original poster? I think Tim also wanted 'x' to be downgraded to a GRanges instance, like Julian's grangesPlain() does. We could use granges() for that. Deciding of an idiom that can be used inline for just dropping the mcols would be good too. `mcols-`(x, value=NULL) is a little bit tricky, ugly, and error prone as you noticed. These are probably enough reasons for not choosing it as *the* idiom. Its only advantage is that it doesn't introduce a new symbol. H. Nothing we can do about this. Cheers, H. to work? hint: it does not -Original Message- From: bioc-devel-boun...@r-project.org [mailto:bioc-devel-boun...@r-project.org] On Behalf Of Hervé Pagès Sent: Monday, May 05, 2014 1:28 PM To: Kasper Daniel Hansen; Michael Lawrence Cc: Johnston, Jeffrey; ttri...@usc.edu; bioc-devel@r-project.org; bioconduc...@r-project.org Subject: Re: [Bioc-devel] [BioC] granges() method for GenomicRanges objects akin to ranges()... Hi, I have no problem using granges() for that. Just to clarify: (a) it would propagate the names() (b) it would drop the metadata() (c) the mcols() would propagate only if 'use.mcols=TRUE' was specified ('use.mcols' is FALSE by default) (d) it would return a GRanges *instance* i.e. input object 'x' would be downgraded to GRanges if it extends GRanges @Kasper: granges() on SummarizedExperiment ignores the 'use.mcols' arg and always propagates the mcols. Alternatively you can use rowData() which also propagates the mcols. granges() is actually just an alias for rowData() on SummarizedExperiment objects. H. On 05/05/2014 10:31 AM, Kasper Daniel Hansen wrote: I agree with Michael on this. I can see why, in some usage cases, granges() is convenient to have with use.mcols=FALSE (which seems to have been added in the latest release). But in my usage of granges(), where I call granges() on objects like SummarizedExperiments and friends, I have been expecting granges() to give me the GRange component of the object. Not a crippled version of the GRange component. This is - to me - very counter intuitive and I wish I had seen this earlier. It is particular frustrating that this default is part of the generic. Best, Kasper On Mon, May 5, 2014 at 12:11 PM, Michael Lawrence lawrence.mich...@gene.com wrote: In my opinion, granges() is not very clear as to the intent. The mcols are part of the GRanges, so why would calling granges() drop them? I think we want something similar to unclass(), unname(), etc. This why I suggested dropmcols(). On Mon, May 5, 2014 at 8:17 AM, Tim Triche, Jr. tim.tri...@gmail.com wrote: That's exactly what I was after -- the generic is already defined, so why not use it? --t On May 5, 2014, at 7:42 AM, Julian Gehring julian.gehr...@embl.de wrote: Hi, On 05.05.2014 16:22, Martin Morgan wrote: generalize as setMcols, like setNames? setMcols(x, NULL) How about Tim's original suggestion, to add a 'granges' method that works on a 'GRanges' input? The current definition granges(x, use.mcols=FALSE, ...) seem suited for this. Best wishes Julian [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] [BioC] granges() method for GenomicRanges objects akin to ranges()...
On 07/08/2014 11:29 AM, Michael Lawrence wrote: On Tue, Jul 8, 2014 at 10:36 AM, Julian Gehring julian.gehr...@embl.de wrote: Hi Herve, 2) A 'dropMcols' or 'dropmcols' method with signature 'GRanges' that is a wrapper for mcols(x) - NULL How about setMcols(), which is more general than dropmcols()? Do you mean a function like: setMcols - function(x, value = NULL) { mcols(x) = value return(x) } I'd be fine with this. However, some argued before that setting to NULL may be counterintuitive for non-advanced users. Probably best to have both setMcols and dropMcols. OK. Let's go for both. Thanks, H. Best wishes Julian ___ Bioconductor mailing list bioconduc...@r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane. science.biology.informatics.conductor [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax:(206) 667-1319 ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] makeTranscriptDbFrom... AnnotationHub
The recent TranscriptDb thread reminded me of a question: are there plans (or am I missing the function) to easily get a TranscriptDb out of the AnnotationHub objects? It would be great to have a preprocessed Ensembl txdb like we have for UCSC. ah - AnnotationHub() gr - ah$ensembl.release.73.gtf.homo_sapiens.Homo_sapiens.GRCh37.73.gtf_0.0.1.RData gr GRanges with 2268089 ranges and 12 metadata columns: seqnames ranges strand | source Rle IRanges Rle | factor [1]1 [11869, 12227] + | processed_transcript [2]1 [12613, 12721] + | processed_transcript [3]1 [13221, 14409] + | processed_transcript [4]1 [11872, 12227] + | unprocessed_pseudogene [5]1 [12613, 12721] + | unprocessed_pseudogene ... ......... ...... [2268085] MT [14747, 15887] + | protein_coding [2268086] MT [14747, 15887] + | protein_coding [2268087] MT [14747, 14749] + | protein_coding [2268088] MT [15888, 15953] + |Mt_tRNA [2268089] MT [15956, 16023] - |Mt_tRNA type score phase gene_id transcript_id factor numeric integer character character [1]exon NA NA ENSG0223972 ENST0456328 [2]exon NA NA ENSG0223972 ENST0456328 [3]exon NA NA ENSG0223972 ENST0456328 [4]exon NA NA ENSG0223972 ENST0515242 [5]exon NA NA ENSG0223972 ENST0515242 ... ... ... ... ... ... [2268085]exon NA NA ENSG0198727 ENST0361789 [2268086] CDS NA 0 ENSG0198727 ENST0361789 [2268087] start_codon NA 0 ENSG0198727 ENST0361789 [2268088]exon NA NA ENSG0210195 ENST0387460 [2268089]exon NA NA ENSG0210196 ENST0387461 exon_number gene_name gene_biotype transcript_name numeric charactercharacter character [1] 1 DDX11L1 pseudogene DDX11L1-002 [2] 2 DDX11L1 pseudogene DDX11L1-002 [3] 3 DDX11L1 pseudogene DDX11L1-002 [4] 1 DDX11L1 pseudogene DDX11L1-201 [5] 2 DDX11L1 pseudogene DDX11L1-201 ... ... ...... ... [2268085] 1 MT-CYB protein_coding MT-CYB-201 [2268086] 1 MT-CYB protein_coding MT-CYB-201 [2268087] 1 MT-CYB protein_coding MT-CYB-201 [2268088] 1 MT-TTMt_tRNA MT-TT-201 [2268089] 1 MT-TPMt_tRNA MT-TP-201 exon_id protein_id character character [1] ENSE2234944NA [2] ENSE3582793NA [3] ENSE2312635NA [4] ENSE2234632NA [5] ENSE3608237NA ... ... ... [2268085] ENSE1436074NA [2268086]NA ENSP0354554 [2268087]NANA [2268088] ENSE1544475NA [2268089] ENSE1544473NA --- seqlengths: 1 2 ... MT NA NA ... NA ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] makeTranscriptDbFrom... AnnotationHub
Hi Michael, On 07/08/2014 12:11 PM, Michael Love wrote: The recent TranscriptDb thread reminded me of a question: are there plans (or am I missing the function) to easily get a TranscriptDb out of the AnnotationHub objects? It would be great to have a preprocessed Ensembl txdb like we have for UCSC. I think the 1st thing we should do is have a makeTranscriptDbFromGRanges() function. It should not be too hard because we already have the code :) Marc wrote it. But it's currently part of the makeTranscriptDbFromGFF() function. Roughly speaking this function does 2 things: (1) import the GFF or GTF file as a GRanges object, then (2) turn that GRanges object into a TranscriptDb object. So we should move the code that does (2) into a separate function, the makeTranscriptDbFromGRanges() function, and have makeTranscriptDbFromGFF() call it internally. Then you could call makeTranscriptDbFromGRanges() on any of these GFF- or GTF-based GRanges objects you get from AnnotationHub. We'll work on this soon and announce here when it becomes available. Cheers, H. ah - AnnotationHub() gr - ah$ensembl.release.73.gtf.homo_sapiens.Homo_sapiens.GRCh37.73.gtf_0.0.1.RData gr GRanges with 2268089 ranges and 12 metadata columns: seqnames ranges strand | source Rle IRanges Rle | factor [1]1 [11869, 12227] + | processed_transcript [2]1 [12613, 12721] + | processed_transcript [3]1 [13221, 14409] + | processed_transcript [4]1 [11872, 12227] + | unprocessed_pseudogene [5]1 [12613, 12721] + | unprocessed_pseudogene ... ......... ...... [2268085] MT [14747, 15887] + | protein_coding [2268086] MT [14747, 15887] + | protein_coding [2268087] MT [14747, 14749] + | protein_coding [2268088] MT [15888, 15953] + |Mt_tRNA [2268089] MT [15956, 16023] - |Mt_tRNA type score phase gene_id transcript_id factor numeric integer character character [1]exon NA NA ENSG0223972 ENST0456328 [2]exon NA NA ENSG0223972 ENST0456328 [3]exon NA NA ENSG0223972 ENST0456328 [4]exon NA NA ENSG0223972 ENST0515242 [5]exon NA NA ENSG0223972 ENST0515242 ... ... ... ... ... ... [2268085]exon NA NA ENSG0198727 ENST0361789 [2268086] CDS NA 0 ENSG0198727 ENST0361789 [2268087] start_codon NA 0 ENSG0198727 ENST0361789 [2268088]exon NA NA ENSG0210195 ENST0387460 [2268089]exon NA NA ENSG0210196 ENST0387461 exon_number gene_name gene_biotype transcript_name numeric charactercharacter character [1] 1 DDX11L1 pseudogene DDX11L1-002 [2] 2 DDX11L1 pseudogene DDX11L1-002 [3] 3 DDX11L1 pseudogene DDX11L1-002 [4] 1 DDX11L1 pseudogene DDX11L1-201 [5] 2 DDX11L1 pseudogene DDX11L1-201 ... ... ...... ... [2268085] 1 MT-CYB protein_coding MT-CYB-201 [2268086] 1 MT-CYB protein_coding MT-CYB-201 [2268087] 1 MT-CYB protein_coding MT-CYB-201 [2268088] 1 MT-TTMt_tRNA MT-TT-201 [2268089] 1 MT-TPMt_tRNA MT-TP-201 exon_id protein_id character character [1] ENSE2234944NA [2] ENSE3582793NA [3] ENSE2312635NA [4] ENSE2234632NA [5] ENSE3608237NA ... ... ... [2268085] ENSE1436074NA [2268086]NA ENSP0354554 [2268087]NANA [2268088] ENSE1544475NA [2268089] ENSE1544473NA --- seqlengths: 1 2 ... MT NA NA ... NA ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514