Re: [Rd] Import data set from another package?
On 02/03/2015 22:48, Therneau, Terry M., Ph.D. wrote: I've moved nlme from Depends to Imports in my coxme package. However, a few of the examples for lmekin use one of the data sets from nlme. This is on purpose, to show how the results are the same and how they differ. If I use data(nlme::ergoStool) the data is not found, data(nlme:::ergoStool) does no better. If I add importFrom(nlme, ergoStool) the error message is that ergoStool is not exported. There likely is a simple way, but I currently don't see it. There were some off-the-mark suggestions in this thread. If you just want a dataset from a package, use data(ergoStool, package = nlme) In particular, it is somewhat wasteful to load a large namespace like nlme when it is not needed. -- Brian D. Ripley, rip...@stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford 1 South Parks Road, Oxford OX1 3TG, UK __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R-devel does not update the C++ returned variables
Hervé Pagès hpa...@fredhutch.org on Mon, 2 Mar 2015 13:00:47 -0800 writes: Hi, On 03/02/2015 12:18 PM, Dénes Tóth wrote: On 03/02/2015 04:37 PM, Martin Maechler wrote: On 2 March 2015 at 09:09, Duncan Murdoch wrote: | I generally recommend that people use Rcpp, which hides a lot of the | details. It will generate your .Call calls for you, and generate the | C++ code that receives them; you just need to think about the real | problem, not the interface. It has its own learning curve, but I think | it is easier than using the low-level code that you need to work with .Call. Thanks for that vote, and I second that. And these days the learning is a lot flatter than it was a decade ago: R Rcpp::cppFunction(NumericVector doubleThis(NumericVector x) { return(2*x); }) R doubleThis(c(1,2,3,21,-4)) [1] 2 4 6 42 -8 R That defined, compiled, loaded and run/illustrated a simple function. Dirk Indeed impressive, ... and it also works with integer vectors something also not 100% trivial when working with compiled code. When testing that, I've went a step further: ## now test: require(microbenchmark) i - 1:10 Note that the relative speed of the algorithms also depends on the size of the input vector. i + i becomes the winner for longer vectors (e.g. i - 1:1e6), but a proper Rcpp version is still approximately twice as fast. The difference in speed is probably due to the fact that R does safe arithmetic. C or C++ do not: doubleThisInt(i) [1] 2147483642 2147483644 2147483646 NA -2147483646 -2147483644 2L * i [1] 2147483642 2147483644 2147483646 NA NA NA Warning message: In 2L * i : NAs produced by integer overflow H. Exactly, excellent, Hervé! Luke also told me so in a private message. and 'i+i' is looking up 'i' twice which is relatively costly for very small i as in my example. This (no safe integer arithmetic in C, but in R) is another good example {as Martin Morgan's} why using Rccp -- or .Call() directly -- may be a too sharp edged sword and maybe should be advocated for good programmers only. Martin Rcpp::cppFunction(NumericVector doubleThisNum(NumericVector x) { return(2*x); }) Rcpp::cppFunction(IntegerVector doubleThisInt(IntegerVector x) { return(2*x); }) i - 1:1e6 mb - microbenchmark::microbenchmark(doubleThisNum(i), doubleThisInt(i), i*2, 2*i, i*2L, 2L*i, i+i, times=100) plot(mb, log=y, notch=TRUE) (mb - microbenchmark(doubleThis(i), i*2, 2*i, i*2L, 2L*i, i+i, times=2^12)) ## Lynne (i7; FC 20), R Under development ... (2015-03-02 r67924): ## Unit: nanoseconds ## expr min lq mean median uq max neval cld ## doubleThis(i) 762 985 1319.5974 1124 1338 17831 4096 b ## i * 2 124 151 258.4419164 221 4 4096 a ## 2 * i 127 154 266.4707169 216 20213 4096 a ## i * 2L 143 164 250.6057181 234 16863 4096 a ## 2L * i 144 177 269.5015193 237 16119 4096 a ## i + i 152 183 272.6179199 243 10434 4096 a plot(mb, log=y, notch=TRUE) ## hmm, looks like even the simple arithm. differ slightly ... ## ## == zoom in: plot(mb, log=y, notch=TRUE, ylim = c(150,300)) dev.copy(png, file=mbenchm-doubling.png) dev.off() # [ - why do I need this here for png ??? ] ##-- see the appended *png graphic Those who've learnt EDA or otherwise about boxplot notches, will know that they provide somewhat informal but robust pairwise tests on approximate 5% level. From these, one *could* - possibly wrongly - conclude that 'i * 2' is significantly faster than both 'i * 2L' and also 'i + i' which I find astonishing, given that i is integer here... Probably no reason for deep thoughts here, but if someone is enticed, this maybe slightly interesting to read. Martin Maechler, ETH Zurich __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing
Re: [Rd] Import data set from another package?
As I expected: there was something simple and obvious, which I somehow could not see. Thanks for the pointer. Terry T. On 03/03/2015 03:12 AM, Prof Brian Ripley wrote: On 02/03/2015 22:48, Therneau, Terry M., Ph.D. wrote: I've moved nlme from Depends to Imports in my coxme package. However, a few of the examples for lmekin use one of the data sets from nlme. This is on purpose, to show how the results are the same and how they differ. If I use data(nlme::ergoStool) the data is not found, data(nlme:::ergoStool) does no better. If I add importFrom(nlme, ergoStool) the error message is that ergoStool is not exported. There likely is a simple way, but I currently don't see it. There were some off-the-mark suggestions in this thread. If you just want a dataset from a package, use data(ergoStool, package = nlme) In particular, it is somewhat wasteful to load a large namespace like nlme when it is not needed. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Feature request: copy attributes in gzcon
The `gzcon` function both modifies and copies a connection object: # compressed text con1 - url(http://www.stats.ox.ac.uk/pub/datasets/csb/ch12.dat.gz;) con2 - gzcon(con1) # almost indistinguishable con1==con2 identical(summary(con2), summary(con1)) # both support gzip readLines(con1, n = 3) readLines(con2, n = 3) # opening one opens both isOpen(con2) open(con1) isOpen(con2) In the example, `con1` and `con2` are two different objects interfacing the same connection. It might seem as if gzcon has simply returned the modified connection object, but the documentation explains that it in fact creates a copy referencing the same connection but with a modified internal structure. It is unclear to me how `con1` is different from `con2`, but given that they represent one and the same connection, would it be possible to make gzcon copy over attributes from the input connection to the output object? This would allow custom connection implementations such as the curl package to use attributes for storing additional metadata about connection. Currently those attributes get dropped after calling gzcon on the connection: library(curl) con - curl(http://www.stats.ox.ac.uk/pub/datasets/csb/ch12.dat.gz;) attr(con, foo) - bar con - gzcon(con) attr(con, foo) It would be very helpful if gzcon would instead copy attributes onto the output object, such that any potential meta-data about the connection as stored in attributes gets retained. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [R] Why does R replace all row values with NAs
On 03/03/2015 02:28 AM, Martin Maechler wrote: Diverted from R-help : as it gets into musing about new R language primitives William Dunlap wdun...@tibco.com on Fri, 27 Feb 2015 08:04:36 -0800 writes: You could define functions like is.true - function(x) !is.na(x) x is.false - function(x) !is.na(x) !x and use them in your selections. E.g., x - data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) x[is.true(x$c = 6), ] a b c 7 7 8 7 10 10 11 10 Bill Dunlap TIBCO Software wdunlap tibco.com Yes; the Matrix package has had these is0 - function(x) !is.na(x) x == 0 isN0 - function(x) is.na(x) | x != 0 is1 - function(x) !is.na(x) x # also == isTRUE componentwise Note that using %in% to block propagation of NAs is about 2x faster: x - sample(c(NA_integer_, 1:1), 50, replace=TRUE) microbenchmark(as.logical(x) %in% TRUE, !is.na(x) x) Unit: milliseconds expr minlq mean medianuq as.logical(x) %in% TRUE 6.034744 6.264382 6.999083 6.29488 6.346028 !is.na(x) x 11.202808 11.402437 11.469101 11.44848 11.517576 max neval 40.36472 100 11.90916 100 namespace hidden for a while [note the comment of the last one!] and using them for readibility in its own code. Maybe we should (again) consider providing some versions of these with R ? The Matrix package also has had fast allFalse - all0 - function(x) .Call(R_all0, x) anyFalse - any0 - function(x) .Call(R_any0, x) ## ## anyFalse - function(x) isTRUE(any(!x))## ~= any0 ## any0 - function(x) isTRUE(any(x == 0)) ## ~= anyFalse namespace hidden as well, already, which probably could also be brought to base R. One big reason to *not* go there (to internal C code) at all with R is that S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group generics) and 'is.na() have been known and package writers have programmed methods for these. To ensure that S3 and S4 dispatch works correctly also inside such new internals is much less easily achieved, and so such a C-based internal function is0() would no longer be equivalent with!is.na(x) x == 0 as soon as 'x' is an object with a '==', 'Compare' and/or an is.na() method. Excellent point. Thank you! It really makes a big difference for developers who maintain a complex hierarchy of S4 classes and methods, when functions like is.true, anyFalse, etc..., which can be expressed in terms of more basic operations like ==, !=, !, is.na, etc..., just work out-of-the-box on objects for which these basic operations are defined. There is conceptually a small set of building blocks, at least for objects with a vector-like or list-like semantic, that can be used to formally describe the semantic of many functions in base R. This is what the man page for anyNA does by saying: anyNA implements any(is.na(x)) even though the actual implementation differs, but that's ok, as long as anyNA is equivalent to doing any(is.na(x)) on any object for which building block is.na() is implemented. Unfortunately there is no clearly identified set of building blocks in base R. For example, if I want the comparison operations to work on my object, I need to implement ==, , , !=, =, and = (the 'Compare' group generics) even though it should be enough to implement == and =, because all the others can be described in terms of these 2 building blocks. unique/duplicated is another example (unique(x) is conceptually x[!duplicated(x)]). And so on... Cheers, H. OTOH, simple R versions such as your 'is.true', called 'is1' inside Matrix maybe optimizable a bit by the byte compiler (and jit and other such tricks) and still keep the full semantic including correct method dispatch. Martin Maechler, ETH Zurich On Fri, Feb 27, 2015 at 7:27 AM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Thank you very much, Duncan. All this being said: What would you say is the most elegant and most safe way to solve such a seemingly simple task? Thank you! On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote: So, Duncan, do I understand you correctly: When I use x$x6, R doesn't know if it's TRUE or FALSE, so it returns a logical value of NA. Yes, when x$x is NA. (Though I think you meant x$c.) When this logical value is applied to a row, the R says: hell, I don't know if I should keep it or not, so, just in case, I am going to keep it, but I'll replace all the values in this row with NAs? Yes. Indexing with a logical NA is probably a mistake, and this is one way to signal it without actually triggering a warning or error. BTW, I should have mentioned that the example
Re: [Rd] [R] Why does R replace all row values with NAs
On 3/3/15 1:26 PM, Hervé Pagès wrote: On 03/03/2015 02:28 AM, Martin Maechler wrote: Diverted from R-help : as it gets into musing about new R language primitives William Dunlap wdun...@tibco.com on Fri, 27 Feb 2015 08:04:36 -0800 writes: You could define functions like is.true - function(x) !is.na(x) x is.false - function(x) !is.na(x) !x and use them in your selections. E.g., x - data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) x[is.true(x$c = 6), ] a b c 7 7 8 7 10 10 11 10 Bill Dunlap TIBCO Software wdunlap tibco.com Yes; the Matrix package has had these is0 - function(x) !is.na(x) x == 0 isN0 - function(x) is.na(x) | x != 0 is1 - function(x) !is.na(x) x # also == isTRUE componentwise Note that using %in% to block propagation of NAs is about 2x faster: x - sample(c(NA_integer_, 1:1), 50, replace=TRUE) microbenchmark(as.logical(x) %in% TRUE, !is.na(x) x) Unit: milliseconds expr minlq mean medianuq as.logical(x) %in% TRUE 6.034744 6.264382 6.999083 6.29488 6.346028 !is.na(x) x 11.202808 11.402437 11.469101 11.44848 11.517576 max neval 40.36472 100 11.90916 100 Unfortunately %in% does not preserve matrix dimensions: x - matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), nrow=50) dim(x) [1] 50 10 dim(!is.na(x) x) [1] 50 10 dim(as.logical(x) %in% TRUE) NULL Stephanie namespace hidden for a while [note the comment of the last one!] and using them for readibility in its own code. Maybe we should (again) consider providing some versions of these with R ? The Matrix package also has had fast allFalse - all0 - function(x) .Call(R_all0, x) anyFalse - any0 - function(x) .Call(R_any0, x) ## ## anyFalse - function(x) isTRUE(any(!x)) ## ~= any0 ## any0 - function(x) isTRUE(any(x == 0)) ## ~= anyFalse namespace hidden as well, already, which probably could also be brought to base R. One big reason to *not* go there (to internal C code) at all with R is that S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group generics) and 'is.na() have been known and package writers have programmed methods for these. To ensure that S3 and S4 dispatch works correctly also inside such new internals is much less easily achieved, and so such a C-based internal function is0() would no longer be equivalent with!is.na(x) x == 0 as soon as 'x' is an object with a '==', 'Compare' and/or an is.na() method. Excellent point. Thank you! It really makes a big difference for developers who maintain a complex hierarchy of S4 classes and methods, when functions like is.true, anyFalse, etc..., which can be expressed in terms of more basic operations like ==, !=, !, is.na, etc..., just work out-of-the-box on objects for which these basic operations are defined. There is conceptually a small set of building blocks, at least for objects with a vector-like or list-like semantic, that can be used to formally describe the semantic of many functions in base R. This is what the man page for anyNA does by saying: anyNA implements any(is.na(x)) even though the actual implementation differs, but that's ok, as long as anyNA is equivalent to doing any(is.na(x)) on any object for which building block is.na() is implemented. Unfortunately there is no clearly identified set of building blocks in base R. For example, if I want the comparison operations to work on my object, I need to implement ==, , , !=, =, and = (the 'Compare' group generics) even though it should be enough to implement == and =, because all the others can be described in terms of these 2 building blocks. unique/duplicated is another example (unique(x) is conceptually x[!duplicated(x)]). And so on... Cheers, H. OTOH, simple R versions such as your 'is.true', called 'is1' inside Matrix maybe optimizable a bit by the byte compiler (and jit and other such tricks) and still keep the full semantic including correct method dispatch. Martin Maechler, ETH Zurich On Fri, Feb 27, 2015 at 7:27 AM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Thank you very much, Duncan. All this being said: What would you say is the most elegant and most safe way to solve such a seemingly simple task? Thank you! On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote: So, Duncan, do I understand you correctly: When I use x$x6, R doesn't know if it's TRUE or FALSE, so it returns a logical value of NA. Yes, when x$x is NA. (Though I think you meant x$c.) When this logical value is applied to a row, the R says: hell, I don't know if I should keep it or not, so, just in case, I am going to keep it,
Re: [Rd] [R] Why does R replace all row values with NAs
Stephanie, Actually, it's as.logical that isn't preserving matrix dimensions, because it coerces to a logical vector: x - matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), nrow=50) dim(as.logical(x)) NULL ~G On Tue, Mar 3, 2015 at 2:09 PM, Stephanie M. Gogarten sdmor...@u.washington.edu wrote: On 3/3/15 1:26 PM, Hervé Pagès wrote: On 03/03/2015 02:28 AM, Martin Maechler wrote: Diverted from R-help : as it gets into musing about new R language primitives William Dunlap wdun...@tibco.com on Fri, 27 Feb 2015 08:04:36 -0800 writes: You could define functions like is.true - function(x) !is.na(x) x is.false - function(x) !is.na(x) !x and use them in your selections. E.g., x - data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) x[is.true(x$c = 6), ] a b c 7 7 8 7 10 10 11 10 Bill Dunlap TIBCO Software wdunlap tibco.com Yes; the Matrix package has had these is0 - function(x) !is.na(x) x == 0 isN0 - function(x) is.na(x) | x != 0 is1 - function(x) !is.na(x) x # also == isTRUE componentwise Note that using %in% to block propagation of NAs is about 2x faster: x - sample(c(NA_integer_, 1:1), 50, replace=TRUE) microbenchmark(as.logical(x) %in% TRUE, !is.na(x) x) Unit: milliseconds expr minlq mean medianuq as.logical(x) %in% TRUE 6.034744 6.264382 6.999083 6.29488 6.346028 !is.na(x) x 11.202808 11.402437 11.469101 11.44848 11.517576 max neval 40.36472 100 11.90916 100 Unfortunately %in% does not preserve matrix dimensions: x - matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), nrow=50) dim(x) [1] 50 10 dim(!is.na(x) x) [1] 50 10 dim(as.logical(x) %in% TRUE) NULL Stephanie namespace hidden for a while [note the comment of the last one!] and using them for readibility in its own code. Maybe we should (again) consider providing some versions of these with R ? The Matrix package also has had fast allFalse - all0 - function(x) .Call(R_all0, x) anyFalse - any0 - function(x) .Call(R_any0, x) ## ## anyFalse - function(x) isTRUE(any(!x)) ## ~= any0 ## any0 - function(x) isTRUE(any(x == 0)) ## ~= anyFalse namespace hidden as well, already, which probably could also be brought to base R. One big reason to *not* go there (to internal C code) at all with R is that S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group generics) and 'is.na() have been known and package writers have programmed methods for these. To ensure that S3 and S4 dispatch works correctly also inside such new internals is much less easily achieved, and so such a C-based internal function is0() would no longer be equivalent with!is.na(x) x == 0 as soon as 'x' is an object with a '==', 'Compare' and/or an is.na() method. Excellent point. Thank you! It really makes a big difference for developers who maintain a complex hierarchy of S4 classes and methods, when functions like is.true, anyFalse, etc..., which can be expressed in terms of more basic operations like ==, !=, !, is.na, etc..., just work out-of-the-box on objects for which these basic operations are defined. There is conceptually a small set of building blocks, at least for objects with a vector-like or list-like semantic, that can be used to formally describe the semantic of many functions in base R. This is what the man page for anyNA does by saying: anyNA implements any(is.na(x)) even though the actual implementation differs, but that's ok, as long as anyNA is equivalent to doing any(is.na(x)) on any object for which building block is.na() is implemented. Unfortunately there is no clearly identified set of building blocks in base R. For example, if I want the comparison operations to work on my object, I need to implement ==, , , !=, =, and = (the 'Compare' group generics) even though it should be enough to implement == and =, because all the others can be described in terms of these 2 building blocks. unique/duplicated is another example (unique(x) is conceptually x[!duplicated(x)]). And so on... Cheers, H. OTOH, simple R versions such as your 'is.true', called 'is1' inside Matrix maybe optimizable a bit by the byte compiler (and jit and other such tricks) and still keep the full semantic including correct method dispatch. Martin Maechler, ETH Zurich On Fri, Feb 27, 2015 at 7:27 AM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Thank you very much, Duncan. All this being said: What would you say is the most elegant and most safe way to solve such a seemingly simple task? Thank you! On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 27/02/2015 9:49 AM, Dimitri Liakhovitski
Re: [Rd] [R] Why does R replace all row values with NAs
On 03/03/2015 02:17 PM, Gabriel Becker wrote: Stephanie, Actually, it's as.logical that isn't preserving matrix dimensions, because it coerces to a logical vector: x - matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), nrow=50) dim(as.logical(x)) It's true, as.logical() doesn't help here but Stephanie is right, %in% does not preserve the dimensions either: dim(x %in% 1:5) NULL That's because match() itself doesn't preserve the dimensions: dim(match(x, 1:5)) NULL So maybe my fast is.true() should be: is.true - function(x) { ans - as.logical(x) %in% TRUE if (is.null(dim(x))) { names(ans) - names(x) } else { dim(ans) - dim(x) dimnames(ans) - dimnames(x) } ans } or something like that... H. NULL ~G On Tue, Mar 3, 2015 at 2:09 PM, Stephanie M. Gogarten sdmor...@u.washington.edu mailto:sdmor...@u.washington.edu wrote: On 3/3/15 1:26 PM, Hervé Pagès wrote: On 03/03/2015 02:28 AM, Martin Maechler wrote: Diverted from R-help : as it gets into musing about new R language primitives William Dunlap wdun...@tibco.com mailto:wdun...@tibco.com on Fri, 27 Feb 2015 08:04:36 -0800 writes: You could define functions like is.true - function(x) !is.na http://is.na(x) x is.false - function(x) !is.na http://is.na(x) !x and use them in your selections. E.g., x - data.frame(a=1:10,b=2:11,c=c(__1,NA,3,NA,5,NA,7,NA,NA,10)) x[is.true(x$c = 6), ] a b c 7 7 8 7 10 10 11 10 Bill Dunlap TIBCO Software wdunlap tibco.com http://tibco.com Yes; the Matrix package has had these is0 - function(x) !is.na http://is.na(x) x == 0 isN0 - function(x) is.na http://is.na(x) | x != 0 is1 - function(x) !is.na http://is.na(x) x # also == isTRUE componentwise Note that using %in% to block propagation of NAs is about 2x faster: x - sample(c(NA_integer_, 1:1), 50, replace=TRUE) microbenchmark(as.logical(x) %in% TRUE, !is.na http://is.na(x) x) Unit: milliseconds expr minlq mean medianuq as.logical(x) %in% TRUE 6.034744 6.264382 6.999083 6.29488 6.346028 !is.na http://is.na(x) x 11.202808 11.402437 11.469101 11.44848 11.517576 max neval 40.36472 100 tel:40.36472%20%20%20100 11.90916 100 Unfortunately %in% does not preserve matrix dimensions: x - matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), nrow=50) dim(x) [1] 50 10 dim(!is.na http://is.na(x) x) [1] 50 10 dim(as.logical(x) %in% TRUE) NULL Stephanie namespace hidden for a while [note the comment of the last one!] and using them for readibility in its own code. Maybe we should (again) consider providing some versions of these with R ? The Matrix package also has had fast allFalse - all0 - function(x) .Call(R_all0, x) anyFalse - any0 - function(x) .Call(R_any0, x) ## ## anyFalse - function(x) isTRUE(any(!x)) ## ~= any0 ## any0 - function(x) isTRUE(any(x == 0)) ## ~= anyFalse namespace hidden as well, already, which probably could also be brought to base R. One big reason to *not* go there (to internal C code) at all with R is that S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group generics) and 'is.na http://is.na() have been known and package writers have programmed methods for these. To ensure that S3 and S4 dispatch works correctly also inside such new internals is much less easily achieved, and so such a C-based internal function is0() would no longer be equivalent with!is.na http://is.na(x) x == 0 as soon as 'x' is an object with a '==', 'Compare' and/or an is.na http://is.na() method. Excellent point. Thank you! It really makes a big difference for developers who maintain a complex hierarchy of S4 classes and methods, when functions like is.true, anyFalse, etc..., which can be expressed in terms of more basic operations like ==, !=, !, is.na http://is.na, etc..., just work out-of-the-box on objects for which these basic operations are
Re: [Bioc-devel] Changes to the SummarizedExperiment Class
I still think GRanges should be a subclass of DataFrame, which would make this easy, but I don't seem to be winning that argument. Just impossible. As Michael mentioned back in November, they have conflicting APIs. Maybe a new GRangesFrame that is a DataFrame and holds a GRanges (without mcols) as an index? [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Changes to the SummarizedExperiment Class
Seems like rowData could be made to work universallly through coercion. rowRanges would not, however, and one would like a convenient mechanism to condition on whether range information is available. One way is to introduce a new class and rely on dispatch. But that adds complexity. On Tue, Mar 3, 2015 at 2:44 PM, Gabe Becker becker.g...@gene.com wrote: Jim et al., Why have two accessors (rowRanges, rowData), each of which are less flexible than the underlying structure and thus will fail (return NULL? or GRanges()/DataFrame() ?) in some proportion of valid objects? ~G On Tue, Mar 3, 2015 at 2:37 PM, Jim Hester james.f.hes...@gmail.com wrote: Motivated by the discussion thread from November (https://stat.ethz.ch/ pipermail/bioc-devel/2014-November/006686.html) the Bioconductor core team is planning on making changes to the SummarizedExperiment class. Our end goal is to allow the @rowData slot to become more flexible and hold either a DataFrame or GRanges type object. To this end we have currently deprecated the current rowData accessor in favor of a rowRanges accessor. This change has resulted in a few broken builds in devel, which we are in the process of fixing now. We will contact any package authors directly if needed for this migration. The rowData accessor will be deprecated in this release, however eventually the plan is to re-purpose this function to serve as an accessor for DataFrame data on the rows. Please let us know if you have any questions with the above and if you need any assistance with the transition. [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Gabriel Becker, Ph.D Computational Biologist Genentech Research [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Changes to the SummarizedExperiment Class
This. It would be damned near perfect as a return value for assays coming out of an object that held several such assays at several time points in a population, where there are both assay-wise and covariate-wise holes that could nonetheless be usefully imputed across assays. Statistics is the grammar of science. Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science On Tue, Mar 3, 2015 at 3:25 PM, Peter Haverty haverty.pe...@gene.com wrote: I still think GRanges should be a subclass of DataFrame, which would make this easy, but I don't seem to be winning that argument. Just impossible. As Michael mentioned back in November, they have conflicting APIs. Maybe a new GRangesFrame that is a DataFrame and holds a GRanges (without mcols) as an index? [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Changes to the SummarizedExperiment Class
Jim et al., Why have two accessors (rowRanges, rowData), each of which are less flexible than the underlying structure and thus will fail (return NULL? or GRanges()/DataFrame() ?) in some proportion of valid objects? ~G On Tue, Mar 3, 2015 at 2:37 PM, Jim Hester james.f.hes...@gmail.com wrote: Motivated by the discussion thread from November (https://stat.ethz.ch/ pipermail/bioc-devel/2014-November/006686.html) the Bioconductor core team is planning on making changes to the SummarizedExperiment class. Our end goal is to allow the @rowData slot to become more flexible and hold either a DataFrame or GRanges type object. To this end we have currently deprecated the current rowData accessor in favor of a rowRanges accessor. This change has resulted in a few broken builds in devel, which we are in the process of fixing now. We will contact any package authors directly if needed for this migration. The rowData accessor will be deprecated in this release, however eventually the plan is to re-purpose this function to serve as an accessor for DataFrame data on the rows. Please let us know if you have any questions with the above and if you need any assistance with the transition. [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Gabriel Becker, Ph.D Computational Biologist Genentech Research [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Changes to the SummarizedExperiment Class
On 03/03/2015 03:06 PM, Peter Haverty wrote: I'd like to see a basic class that takes a DataFrame and a sub-class that takes a GRanges. Yes. I still think GRanges should be a subclass of DataFrame, which would make this easy, but I don't seem to be winning that argument. Just impossible. As Michael mentioned back in November, they have conflicting APIs. While the hood is up, can we try some different names? SummarizedExperiment never seemed like a great fit to me because it doesn't necessarily contain experiments or summaries thereof. It's a collection of like-sized rectangular things with metadata on the two dimensions. Maybe the name could reflect what it holds rather than a common use case? AnnotatedMatrixList? We actually need 2 names: 1 for the parent class, 1 for the child. I'm starting to think that introducing 2 new names would maybe make the migration a little bit easier, especially since the plan is to move the refactored SummarizedExperiment to its own package. With 2 new names we can start the new package, implement the 2 new classes in it, and have the old SummarizedExperiment (in GenomicRanges) and the 2 new classes peacefully cohabit during the time of the migration. Cheers, H. Anyway, I'm excited to see a version on the way that takes a DataFrame as rowData. I'm glad you guys are working on that. Regards, Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Tue, Mar 3, 2015 at 2:57 PM, Michael Lawrence lawrence.mich...@gene.com wrote: Seems like rowData could be made to work universallly through coercion. rowRanges would not, however, and one would like a convenient mechanism to condition on whether range information is available. One way is to introduce a new class and rely on dispatch. But that adds complexity. On Tue, Mar 3, 2015 at 2:44 PM, Gabe Becker becker.g...@gene.com wrote: Jim et al., Why have two accessors (rowRanges, rowData), each of which are less flexible than the underlying structure and thus will fail (return NULL? or GRanges()/DataFrame() ?) in some proportion of valid objects? ~G On Tue, Mar 3, 2015 at 2:37 PM, Jim Hester james.f.hes...@gmail.com wrote: Motivated by the discussion thread from November ( https://stat.ethz.ch/ pipermail/bioc-devel/2014-November/006686.html) the Bioconductor core team is planning on making changes to the SummarizedExperiment class. Our end goal is to allow the @rowData slot to become more flexible and hold either a DataFrame or GRanges type object. To this end we have currently deprecated the current rowData accessor in favor of a rowRanges accessor. This change has resulted in a few broken builds in devel, which we are in the process of fixing now. We will contact any package authors directly if needed for this migration. The rowData accessor will be deprecated in this release, however eventually the plan is to re-purpose this function to serve as an accessor for DataFrame data on the rows. Please let us know if you have any questions with the above and if you need any assistance with the transition. [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Gabriel Becker, Ph.D Computational Biologist Genentech Research [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Changes to the SummarizedExperiment Class
There are some nice similarities in these new imaginary types. A GRangesFrame is a list of dimensionally identical things (columns) and some row meta-data (the GRanges). The SE-like object is similarly a list of dimensionally like things (matrices, RleDataFrames, BigMatrix objects, HDF5-backed things) with some row meta-data (a DataFrame or GRangesFrame). Elegant? Maybe they would actually be relatives in the class tree. I wonder if this kind of thing would be easier if we had Java-style Interfaces or duck-typing. The x slot of y holds something that implements this set of methods ... Oh, and kinda apropos, the genoset class will probably go away or become an extension to this new SE-like thing. The extra stuff that comes along with genoset will still be available. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Tue, Mar 3, 2015 at 3:42 PM, Tim Triche, Jr. tim.tri...@gmail.com wrote: This. It would be damned near perfect as a return value for assays coming out of an object that held several such assays at several time points in a population, where there are both assay-wise and covariate-wise holes that could nonetheless be usefully imputed across assays. Statistics is the grammar of science. Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science On Tue, Mar 3, 2015 at 3:25 PM, Peter Haverty haverty.pe...@gene.com wrote: I still think GRanges should be a subclass of DataFrame, which would make this easy, but I don't seem to be winning that argument. Just impossible. As Michael mentioned back in November, they have conflicting APIs. Maybe a new GRangesFrame that is a DataFrame and holds a GRanges (without mcols) as an index? [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Rd] Errors on Windows with grep(fixed=TRUE) on UTF-8 strings
After a bit more investigation, I think I've found the cause of the bug, and I have a patch. This bug happens with grep(), when: * Running on Windows. * The search uses fixed=TRUE. * The search pattern is a single byte. * The current locale has a multibyte encoding. === Here's an example that demonstrates the bug: # First, create a 3-byte UTF-8 character y - rawToChar(as.raw(c(0xe6, 0xb8, 0x97))) Encoding(y) - UTF-8 y # [1] 渗 # In my default locale, grep with a single-char pattern and fixed=TRUE # returns integer(0), as expected. Sys.getlocale(LC_CTYPE) # [1] English_United States.1252 grep(a, y, fixed = TRUE) # integer(0) # When the using a multibyte locale, grep with a single-char # pattern and fixed=TRUE results in an error. Sys.setlocale(LC_CTYPE, chinese) grep(a, y, fixed = TRUE) # Error in grep(a, y, fixed = TRUE) : invalid multibyte string at '97' === I believe the problem is in the main/grep.c file, in the fgrep_one function. It tests for a multi-byte character string locale `mbcslocale`, and then for the `use_UTF8`, like so: if (!useBytes mbcslocale) { ... } else if (!useBytes use_UTF8) { ... } else ... This can be seen at https://github.com/wch/r-source/blob/e92b4c1cba05762480cd3898335144e5dd111cb7/src/main/grep.c#L668-L692 A similar pattern occurs in the fgrep_one_bytes function, at https://github.com/wch/r-source/blob/e92b4c1cba05762480cd3898335144e5dd111cb7/src/main/grep.c#L718-L736 I believe that the test order should be reversed; it should test first for `use_UTF8`, and then for `mbcslocale`. This pattern occurs in a few places in grep.c. It looks like this: if (!useBytes use_UTF8) { ... } else if (!useBytes mbcslocale) { ... } else ... === This patch does what I described; it simply tests for `use_UTF8` first, and then `mbcslocale`, in both fgrep_one and fgrep_one_bytes. I made this patch against the 3.1.2 sources, and tested the example code above. In both cases, grep() returned integer(0), as expected. (The reason I made this change against 3.1.2 is because I had problems getting the current trunk to compile on both Linux or Windows.) diff --git src/main/grep.c src/main/grep.c index 6e6ec3e..348c63d 100644 --- src/main/grep.c +++ src/main/grep.c @@ -664,27 +664,27 @@ static int fgrep_one(const char *pat, const char *target, } return -1; } -if (!useBytes mbcslocale) { /* skip along by chars */ - mbstate_t mb_st; +if (!useBytes use_UTF8) { int ib, used; - mbs_init(mb_st); for (ib = 0, i = 0; ib = len-plen; i++) { if (strncmp(pat, target+ib, plen) == 0) { if (next != NULL) *next = ib + plen; return i; } - used = (int) Mbrtowc(NULL, target+ib, MB_CUR_MAX, mb_st); + used = utf8clen(target[ib]); if (used = 0) break; ib += used; } -} else if (!useBytes use_UTF8) { +} else if (!useBytes mbcslocale) { /* skip along by chars */ + mbstate_t mb_st; int ib, used; + mbs_init(mb_st); for (ib = 0, i = 0; ib = len-plen; i++) { if (strncmp(pat, target+ib, plen) == 0) { if (next != NULL) *next = ib + plen; return i; } - used = utf8clen(target[ib]); + used = (int) Mbrtowc(NULL, target+ib, MB_CUR_MAX, mb_st); if (used = 0) break; ib += used; } @@ -714,21 +714,21 @@ static int fgrep_one_bytes(const char *pat, const char *target, int len, if (*p == pat[0]) return i; return -1; } -if (!useBytes mbcslocale) { /* skip along by chars */ - mbstate_t mb_st; +if (!useBytes use_UTF8) { /* not really needed */ int ib, used; - mbs_init(mb_st); for (ib = 0, i = 0; ib = len-plen; i++) { if (strncmp(pat, target+ib, plen) == 0) return ib; - used = (int) Mbrtowc(NULL, target+ib, MB_CUR_MAX, mb_st); + used = utf8clen(target[ib]); if (used = 0) break; ib += used; } -} else if (!useBytes use_UTF8) { /* not really needed */ +} else if (!useBytes mbcslocale) { /* skip along by chars */ + mbstate_t mb_st; int ib, used; + mbs_init(mb_st); for (ib = 0, i = 0; ib = len-plen; i++) { if (strncmp(pat, target+ib, plen) == 0) return ib; - used = utf8clen(target[ib]); + used = (int) Mbrtowc(NULL, target+ib, MB_CUR_MAX, mb_st); if (used = 0) break; ib += used; } -Winston __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [R] Why does R replace all row values with NAs
Diverted from R-help : as it gets into musing about new R language primitives William Dunlap wdun...@tibco.com on Fri, 27 Feb 2015 08:04:36 -0800 writes: You could define functions like is.true - function(x) !is.na(x) x is.false - function(x) !is.na(x) !x and use them in your selections. E.g., x - data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) x[is.true(x$c = 6), ] a b c 7 7 8 7 10 10 11 10 Bill Dunlap TIBCO Software wdunlap tibco.com Yes; the Matrix package has had these is0 - function(x) !is.na(x) x == 0 isN0 - function(x) is.na(x) | x != 0 is1 - function(x) !is.na(x) x # also == isTRUE componentwise namespace hidden for a while [note the comment of the last one!] and using them for readibility in its own code. Maybe we should (again) consider providing some versions of these with R ? The Matrix package also has had fast allFalse - all0 - function(x) .Call(R_all0, x) anyFalse - any0 - function(x) .Call(R_any0, x) ## ## anyFalse - function(x) isTRUE(any(!x)) ## ~= any0 ## any0 - function(x) isTRUE(any(x == 0))## ~= anyFalse namespace hidden as well, already, which probably could also be brought to base R. One big reason to *not* go there (to internal C code) at all with R is that S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group generics) and 'is.na() have been known and package writers have programmed methods for these. To ensure that S3 and S4 dispatch works correctly also inside such new internals is much less easily achieved, and so such a C-based internal function is0() would no longer be equivalent with!is.na(x) x == 0 as soon as 'x' is an object with a '==', 'Compare' and/or an is.na() method. OTOH, simple R versions such as your 'is.true', called 'is1' inside Matrix maybe optimizable a bit by the byte compiler (and jit and other such tricks) and still keep the full semantic including correct method dispatch. Martin Maechler, ETH Zurich On Fri, Feb 27, 2015 at 7:27 AM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Thank you very much, Duncan. All this being said: What would you say is the most elegant and most safe way to solve such a seemingly simple task? Thank you! On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote: So, Duncan, do I understand you correctly: When I use x$x6, R doesn't know if it's TRUE or FALSE, so it returns a logical value of NA. Yes, when x$x is NA. (Though I think you meant x$c.) When this logical value is applied to a row, the R says: hell, I don't know if I should keep it or not, so, just in case, I am going to keep it, but I'll replace all the values in this row with NAs? Yes. Indexing with a logical NA is probably a mistake, and this is one way to signal it without actually triggering a warning or error. BTW, I should have mentioned that the example where you indexed using -which(x$c=6) is a bad idea: if none of the entries were 6 or more, this would be indexing with an empty vector, and you'd get nothing, not everything. Duncan Murdoch On Fri, Feb 27, 2015 at 9:13 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote: I know how to get the output I need, but I would benefit from an explanation why R behaves the way it does. # I have a data frame x: x = data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) x # I want to toss rows in x that contain values =6. But I don't want to toss my NAs there. subset(x,c6) # Works correctly, but removes NAs in c, understand why x[which(x$c6),] # Works correctly, but removes NAs in c, understand why x[-which(x$c=6),] # output I need # Here is my question: why does the following line replace the values of all rows that contain an NA # in x$c with NAs? x[x$c6,] # Leaves rows with c=NA, but makes the whole row an NA. Why??? x[(x$c6) | is.na(x$c),] # output I need - I have to be super-explicit Thank you very much! Most of your examples (except the ones using which()) are doing logical indexing. In logical indexing, TRUE keeps a line, FALSE drops the line, and NA returns NA. Since x$c 6 is NA if x$c is NA, you get the third kind of indexing. Your last example works because in the cases where x$c is NA, it evaluates NA | TRUE, and that evaluates to TRUE. In the cases where x$c is not NA, you get x$c 6 | FALSE, and that's the same as x$c 6, which will be either TRUE or FALSE. Duncan Murdoch
[Rd] Asking for tasks of summer code 2015
Hey everyone: I am a Master student from Saarland Unirversity, Germany with the major of Bioinformatics. And I am interested in statistical learning which is also my major work in the future with the implementation by R. So I 'd like join the google summer code this year by doing tasks in your community. However I can not find whether there are tasks available provided for this year, anyone can tell me? Hank Cao [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Asssistance
Hi to All, I am building a package in R and whenever I run command R CMD build OAR in the terminal, I get the following error: * checking for file ‘OAR/DESCRIPTION’ ... OK * preparing ‘OAR’: * checking DESCRIPTION meta-information ... ERROR Malformed Depends or Suggests or Imports or Enhances field. Offending entries: R (=3.0.2) Entries must be names of packages optionally followed by '=' or '=', white space, and a valid version number in parentheses. See the information on DESCRIPTION files in section 'Creating R packages' of the 'Writing R Extensions' manual. This is my first time to build a package using R and it's very hard for me to figure out where the problem is. I kindly call for your assistance in fixing the problem. Below is my function; bcidata - read.csv(~/Desktop/Files_for_Package/data.csv); bcidata Modelsfunc- function(bcidata){ occupancymean.data.frame - NULL for (k in seq(2.5,250,by=2.5)){ i - 1000/k j - 500/k bcidata$Xgrid - cut(bcidata$PX, breaks = i, include.lowest = T) bcidata$Ygrid - cut(bcidata$PY, breaks = j, include.lowest = T) bcidata$IDgrid - with(bcidata, interaction(Xgrid,Ygrid)) bcidata$IDNgrid - factor(bcidata$IDgrid) levels(bcidata$IDgrid) - seq_along(levels(bcidata$IDgrid)) bcidata$count - ave(bcidata$PX, bcidata$IDgrid, FUN = length) aggregate - aggregate(bcidata$PX,bcidata[,c(Xgrid,Ygrid,IDNgrid)], FUN = length) Totalgrids - length(levels(bcidata$IDgrid)) Occupiedgrids - length(aggregate$IDNgrid) sum - sum(aggregate$x) TotalArea - 50 Area - (1000/i*500/j) Occupancy - (Occupiedgrids/Totalgrids) Mean - length(bcidata$Latin)/(Occupiedgrids) Variance - var(aggregate$x) occupancymean.data.frame - rbind(occupancymean.data.frame, data.frame(Area, Totalgrids, Occupiedgrids, Occupancy, Mean, Variance)) } occupancymean.data.frame Occupancy - occupancymean.data.frame$Occupancy Mean - occupancymean.data.frame$Mean poission - nls(Occupancy ~ 1-exp(-rho*Mean), start = list(rho = 2.1), data = occupancymean.data.frame) nachman - nls(Occupancy ~ 1-exp(-alpha*Mean^beta), start = list(alpha = 0.2, beta = 0.1), data = occupancymean.data.frame) logistic - nls(Occupancy ~ (alpha*Mean^beta)/(1+alpha*Mean^beta), start = list(alpha = 0.2, beta = 0.1),data = occupancymean.data.frame) nbd - nls(Occupancy ~ 1-(1+(Mean)/k)^-k, start = list(k = 1), data = occupancymean.data.frame) power - nls(Occupancy ~ alpha*Mean^beta, start = list(alpha = 0.2, beta= 0.1), data = occupancymean.data.frame) inbd - nls(Occupancy ~ 1-(alpha*(Mean^(beta-1)))^(Mean/(1-alpha*Mean^(beta-1))), start = list(alpha = 0.2, beta = 0.3), data = occupancymean.data.frame) fnbd - nls(Occupancy ~ 1- (gamma(N + k/(Mean*A/N)-k)*gamma(k/(Mean*A/N)))/(gamma(k/(Mean*A/N)-k)*gamma(N+k/(Mean*A/N))), start = list(k = 0.2, A = 0.1, N = 0.2), data = occupancymean .data.frame) bayesianII - nls(Occupancy ~ 1-(theta*beta^(2*(TotalArea *Mean/sum)^0.5)*delta^(TotalArea*Mean/sum)), start = list(theta=0.9956, beta=1, delta=1), data = occupancymean.data.frame) return(list(summary(poission), summary(nachman), summary(logistic), summary(nbd), summary(power), summary(inbd), summary(fnbd), summary( bayesianII))) } Modelsfunc(bcidata) Your assistance will be highly appreciated. Thanks in advance. Regards, *Evans Ochiaga* *African Institute for Mathematical Sciences* *6 Melrose Road* *Muizenberg, South Africa* *Msc in Mathematical Sciences+27 84 61 69 183 * *When I cannot understand my Father’s leading, And it seems to be but hard and cruel fate, Still I hear that gentle whisper ever pleading, God is working, God is faithful—Only wait.* [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Asssistance
On 03/03/2015 5:47 AM, Evans Otieno Ochiaga wrote: Hi to All, I am building a package in R and whenever I run command R CMD build OAR in the terminal, I get the following error: * checking for file ‘OAR/DESCRIPTION’ ... OK * preparing ‘OAR’: * checking DESCRIPTION meta-information ... ERROR Malformed Depends or Suggests or Imports or Enhances field. Offending entries: R (=3.0.2) Entries must be names of packages optionally followed by '=' or '=', white space, and a valid version number in parentheses. That looks okay; I'm guessing it is out of place. Can you show use your DESCRIPTION file? Duncan Murdoch See the information on DESCRIPTION files in section 'Creating R packages' of the 'Writing R Extensions' manual. This is my first time to build a package using R and it's very hard for me to figure out where the problem is. I kindly call for your assistance in fixing the problem. Below is my function; bcidata - read.csv(~/Desktop/Files_for_Package/data.csv); bcidata Modelsfunc- function(bcidata){ occupancymean.data.frame - NULL for (k in seq(2.5,250,by=2.5)){ i - 1000/k j - 500/k bcidata$Xgrid - cut(bcidata$PX, breaks = i, include.lowest = T) bcidata$Ygrid - cut(bcidata$PY, breaks = j, include.lowest = T) bcidata$IDgrid - with(bcidata, interaction(Xgrid,Ygrid)) bcidata$IDNgrid - factor(bcidata$IDgrid) levels(bcidata$IDgrid) - seq_along(levels(bcidata$IDgrid)) bcidata$count - ave(bcidata$PX, bcidata$IDgrid, FUN = length) aggregate - aggregate(bcidata$PX,bcidata[,c(Xgrid,Ygrid,IDNgrid)], FUN = length) Totalgrids - length(levels(bcidata$IDgrid)) Occupiedgrids - length(aggregate$IDNgrid) sum - sum(aggregate$x) TotalArea - 50 Area - (1000/i*500/j) Occupancy - (Occupiedgrids/Totalgrids) Mean - length(bcidata$Latin)/(Occupiedgrids) Variance - var(aggregate$x) occupancymean.data.frame - rbind(occupancymean.data.frame, data.frame(Area, Totalgrids, Occupiedgrids, Occupancy, Mean, Variance)) } occupancymean.data.frame Occupancy - occupancymean.data.frame$Occupancy Mean - occupancymean.data.frame$Mean poission - nls(Occupancy ~ 1-exp(-rho*Mean), start = list(rho = 2.1), data = occupancymean.data.frame) nachman - nls(Occupancy ~ 1-exp(-alpha*Mean^beta), start = list(alpha = 0.2, beta = 0.1), data = occupancymean.data.frame) logistic - nls(Occupancy ~ (alpha*Mean^beta)/(1+alpha*Mean^beta), start = list(alpha = 0.2, beta = 0.1),data = occupancymean.data.frame) nbd - nls(Occupancy ~ 1-(1+(Mean)/k)^-k, start = list(k = 1), data = occupancymean.data.frame) power - nls(Occupancy ~ alpha*Mean^beta, start = list(alpha = 0.2, beta= 0.1), data = occupancymean.data.frame) inbd - nls(Occupancy ~ 1-(alpha*(Mean^(beta-1)))^(Mean/(1-alpha*Mean^(beta-1))), start = list(alpha = 0.2, beta = 0.3), data = occupancymean.data.frame) fnbd - nls(Occupancy ~ 1- (gamma(N + k/(Mean*A/N)-k)*gamma(k/(Mean*A/N)))/(gamma(k/(Mean*A/N)-k)*gamma(N+k/(Mean*A/N))), start = list(k = 0.2, A = 0.1, N = 0.2), data = occupancymean .data.frame) bayesianII - nls(Occupancy ~ 1-(theta*beta^(2*(TotalArea *Mean/sum)^0.5)*delta^(TotalArea*Mean/sum)), start = list(theta=0.9956, beta=1, delta=1), data = occupancymean.data.frame) return(list(summary(poission), summary(nachman), summary(logistic), summary(nbd), summary(power), summary(inbd), summary(fnbd), summary( bayesianII))) } Modelsfunc(bcidata) Your assistance will be highly appreciated. Thanks in advance. Regards, *Evans Ochiaga* *African Institute for Mathematical Sciences* *6 Melrose Road* *Muizenberg, South Africa* *Msc in Mathematical Sciences+27 84 61 69 183 * *When I cannot understand my Father’s leading, And it seems to be but hard and cruel fate, Still I hear that gentle whisper ever pleading, God is working, God is faithful—Only wait.* [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Asssistance
Hi Evans, * checking for file ‘OAR/DESCRIPTION’ ... OK * preparing ‘OAR’: * checking DESCRIPTION meta-information ... ERROR Malformed Depends or Suggests or Imports or Enhances field. Offending entries: R (=3.0.2) Entries must be names of packages optionally followed by '=' or '=', white space, and a valid version number in parentheses. The _white space_ (see explanation above) seems to be missing. Try R (= 3.0.2) Best, /g __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel