Re: [Rd] Import data set from another package?

2015-03-03 Thread Prof Brian Ripley

On 02/03/2015 22:48, Therneau, Terry M., Ph.D. wrote:

I've moved nlme from Depends to Imports in my coxme package. However, a
few of the examples for lmekin use one of the data sets from nlme.  This
is on purpose, to show how the results are the same and how they differ.

  If I use  data(nlme::ergoStool)  the data is not found,
data(nlme:::ergoStool) does no better.
  If I add importFrom(nlme, ergoStool) the error message is that
ergoStool is not exported.

There likely is a simple way, but I currently don't see it.


There were some off-the-mark suggestions in this thread.  If you just 
want a dataset from a package, use


data(ergoStool, package = nlme)

In particular, it is somewhat wasteful to load a large namespace like 
nlme when it is not needed.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-devel does not update the C++ returned variables

2015-03-03 Thread Martin Maechler
 Hervé Pagès hpa...@fredhutch.org
 on Mon, 2 Mar 2015 13:00:47 -0800 writes:

 Hi,
 On 03/02/2015 12:18 PM, Dénes Tóth wrote:
 
 
 On 03/02/2015 04:37 PM, Martin Maechler wrote:
 
 On 2 March 2015 at 09:09, Duncan Murdoch wrote:
 | I generally recommend that people use Rcpp, which hides a lot of the
 | details.  It will generate your .Call calls for you, and generate the
 | C++ code that receives them; you just need to think about the real
 | problem, not the interface.  It has its own learning curve, but I
 think
 | it is easier than using the low-level code that you need to work
 with .Call.
 
 Thanks for that vote, and I second that.
 
 And these days the learning is a lot flatter than it was a decade ago:
 
R Rcpp::cppFunction(NumericVector doubleThis(NumericVector x) {
 return(2*x); })
R doubleThis(c(1,2,3,21,-4))
 [1]  2  4  6 42 -8
R 
 
 That defined, compiled, loaded and run/illustrated a simple function.
 
 Dirk
 
 Indeed impressive,  ... and it also works with integer vectors
 something also not 100% trivial when working with compiled code.
 
 When testing that, I've went a step further:
 
 ## now test:
 require(microbenchmark)
 i - 1:10
 
 Note that the relative speed of the algorithms also depends on the size
 of the input vector. i + i becomes the winner for longer vectors (e.g. i
 - 1:1e6), but a proper Rcpp version is still approximately twice as 
fast.

 The difference in speed is probably due to the fact that R does safe
 arithmetic. C or C++ do not:

 doubleThisInt(i)
 [1]  2147483642  2147483644  2147483646  NA -2147483646 
 -2147483644

 2L * i
 [1] 2147483642 2147483644 2147483646 NA NA NA
 Warning message:
 In 2L * i : NAs produced by integer overflow

 H.

Exactly, excellent, Hervé!

Luke also told me so in a private message.
and 'i+i' is looking up 'i' twice  which is relatively costly
for very small i  as in my example.

This (no safe integer arithmetic in C, but in R) is another
good example {as Martin Morgan's}  why using 
Rccp -- or .Call() directly -- may be a too sharp edged sword and
maybe should be advocated for good programmers only.

Martin


 
 Rcpp::cppFunction(NumericVector doubleThisNum(NumericVector x) {
 return(2*x); })
 Rcpp::cppFunction(IntegerVector doubleThisInt(IntegerVector x) {
 return(2*x); })
 i - 1:1e6
 mb - microbenchmark::microbenchmark(doubleThisNum(i), doubleThisInt(i),
 i*2, 2*i, i*2L, 2L*i, i+i, times=100)
 plot(mb, log=y, notch=TRUE)
 
 
 (mb - microbenchmark(doubleThis(i), i*2, 2*i, i*2L, 2L*i, i+i,
 times=2^12))
 ## Lynne (i7; FC 20), R Under development ... (2015-03-02 r67924):
 ## Unit: nanoseconds
 ##   expr min  lq  mean median   uq   max neval cld
 ##  doubleThis(i) 762 985 1319.5974   1124 1338 17831  4096   b
 ##  i * 2 124 151  258.4419164  221 4  4096  a
 ##  2 * i 127 154  266.4707169  216 20213  4096  a
 ## i * 2L 143 164  250.6057181  234 16863  4096  a
 ## 2L * i 144 177  269.5015193  237 16119  4096  a
 ##  i + i 152 183  272.6179199  243 10434  4096  a
 
 plot(mb, log=y, notch=TRUE)
 ## hmm, looks like even the simple arithm. differ slightly ...
 ##
 ## == zoom in:
 plot(mb, log=y, notch=TRUE, ylim = c(150,300))
 
 dev.copy(png, file=mbenchm-doubling.png)
 dev.off() # [ - why do I need this here for png ??? ]
 ##-- see the appended *png graphic
 
 Those who've learnt EDA or otherwise about boxplot notches, will
 know that they provide somewhat informal but robust pairwise tests on
 approximate 5% level.
 From these, one *could* - possibly wrongly - conclude that
 'i * 2' is significantly faster than both 'i * 2L' and also
 'i + i'  which I find astonishing, given that  i is integer here...
 
 Probably no reason for deep thoughts here, but if someone is
 enticed, this maybe slightly interesting to read.
 
 Martin Maechler, ETH Zurich
 
 
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

 -- 
 Hervé Pagès

 Program in Computational Biology
 Division of Public Health Sciences
 Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N, M1-B514
 P.O. Box 19024
 Seattle, WA 98109-1024

 E-mail: hpa...@fredhutch.org
 Phone:  (206) 667-5791
 Fax:(206) 667-1319

__
R-devel@r-project.org mailing 

Re: [Rd] Import data set from another package?

2015-03-03 Thread Therneau, Terry M., Ph.D.

As I expected: there was something simple and obvious, which I somehow could 
not see.
Thanks for the pointer.

Terry T.


On 03/03/2015 03:12 AM, Prof Brian Ripley wrote:

On 02/03/2015 22:48, Therneau, Terry M., Ph.D. wrote:

I've moved nlme from Depends to Imports in my coxme package. However, a
few of the examples for lmekin use one of the data sets from nlme.  This
is on purpose, to show how the results are the same and how they differ.

  If I use  data(nlme::ergoStool)  the data is not found,
data(nlme:::ergoStool) does no better.
  If I add importFrom(nlme, ergoStool) the error message is that
ergoStool is not exported.

There likely is a simple way, but I currently don't see it.


There were some off-the-mark suggestions in this thread.  If you just want a 
dataset from
a package, use

data(ergoStool, package = nlme)

In particular, it is somewhat wasteful to load a large namespace like nlme when 
it is not
needed.




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Feature request: copy attributes in gzcon

2015-03-03 Thread Jeroen Ooms
The `gzcon` function both modifies and copies a connection object:

  # compressed text
  con1 - url(http://www.stats.ox.ac.uk/pub/datasets/csb/ch12.dat.gz;)
  con2 - gzcon(con1)

  # almost indistinguishable
  con1==con2
  identical(summary(con2), summary(con1))

  # both support gzip
  readLines(con1, n = 3)
  readLines(con2, n = 3)

  # opening one opens both
  isOpen(con2)
  open(con1)
  isOpen(con2)

In the example, `con1` and `con2` are two different objects
interfacing the same connection. It might seem as if gzcon has simply
returned the modified connection object, but the documentation
explains that it in fact creates a copy referencing the same
connection but with a modified internal structure.

It is unclear to me how `con1` is different from `con2`, but given
that they represent one and the same connection, would it be possible
to make gzcon copy over attributes from the input connection to the
output object?

This would allow custom connection implementations such as the curl
package to use attributes for storing additional metadata about
connection. Currently those attributes get dropped after calling gzcon
on the connection:

  library(curl)
  con - curl(http://www.stats.ox.ac.uk/pub/datasets/csb/ch12.dat.gz;)
  attr(con, foo) - bar

  con - gzcon(con)
  attr(con, foo)

It would be very helpful if gzcon would instead copy attributes onto
the output object, such that any potential meta-data about the
connection as stored in attributes gets retained.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] Why does R replace all row values with NAs

2015-03-03 Thread Hervé Pagès



On 03/03/2015 02:28 AM, Martin Maechler wrote:

Diverted from R-help :
 as it gets into musing about new R language primitives


William Dunlap wdun...@tibco.com
 on Fri, 27 Feb 2015 08:04:36 -0800 writes:


  You could define functions like

  is.true - function(x) !is.na(x)  x
  is.false - function(x) !is.na(x)  !x

  and use them in your selections.  E.g.,
  x - data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10))
  x[is.true(x$c = 6), ]
  a  b  c
  7   7  8  7
  10 10 11 10

  Bill Dunlap
  TIBCO Software
  wdunlap tibco.com

Yes; the Matrix package has had these

is0  - function(x) !is.na(x)  x == 0
isN0 - function(x)  is.na(x) | x != 0
is1  - function(x) !is.na(x)  x   # also == isTRUE componentwise


Note that using %in% to block propagation of NAs is about 2x faster:

 x - sample(c(NA_integer_, 1:1), 50, replace=TRUE)
 microbenchmark(as.logical(x) %in% TRUE, !is.na(x)  x)
Unit: milliseconds
expr   minlq  mean   medianuq
 as.logical(x) %in% TRUE  6.034744  6.264382  6.999083  6.29488  6.346028
   !is.na(x)  x 11.202808 11.402437 11.469101 11.44848 11.517576
  max neval
 40.36472   100
 11.90916   100





namespace hidden for a while  [note the comment of the last one!]
and using them for readibility in its own code.

Maybe we should (again) consider providing some versions of
these with R ?

The Matrix package also has had fast

allFalse - all0 - function(x) .Call(R_all0, x)
anyFalse - any0 - function(x) .Call(R_any0, x)
##
## anyFalse - function(x) isTRUE(any(!x))## ~= any0
## any0 - function(x) isTRUE(any(x == 0)) ## ~= anyFalse

namespace hidden as well, already, which probably could also be
brought to base R.

One big reason to *not* go there (to internal C code) at all with R is that
S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group generics)
and 'is.na() have been known and package writers have
programmed methods for these.
To ensure that S3 and S4 dispatch works correctly also inside
such new internals is much less easily achieved, and so
such a C-based internal function  is0() would no longer be
equivalent with!is.na(x)  x == 0
as soon as 'x' is an object with a '==', 'Compare' and/or an is.na() method.


Excellent point. Thank you! It really makes a big difference for 
developers who maintain a complex hierarchy of S4 classes and methods,

when functions like is.true, anyFalse, etc..., which can be expressed in
terms of more basic operations like ==, !=, !, is.na, etc..., just work
out-of-the-box on objects for which these basic operations are defined.

There is conceptually a small set of building blocks, at least for
objects with a vector-like or list-like semantic, that can be used
to formally describe the semantic of many functions in base R. This
is what the man page for anyNA does by saying:

  anyNA implements any(is.na(x))

even though the actual implementation differs, but that's ok, as long
as anyNA is equivalent to doing any(is.na(x)) on any object for which
building block is.na() is implemented.

Unfortunately there is no clearly identified set of building blocks
in base R. For example, if I want the comparison operations to work
on my object, I need to implement ==, , , !=, =, and = (the
'Compare' group generics) even though it should be enough to implement
== and =, because all the others can be described in terms of these
2 building blocks. unique/duplicated is another example (unique(x) is
conceptually x[!duplicated(x)]). And so on...

Cheers,
H.



OTOH, simple R versions such as your  'is.true',  called 'is1'
inside Matrix maybe optimizable a bit by the byte compiler (and
jit and other such tricks) and still keep the full
semantic including correct method dispatch.

Martin Maechler, ETH Zurich


  On Fri, Feb 27, 2015 at 7:27 AM, Dimitri Liakhovitski 
  dimitri.liakhovit...@gmail.com wrote:

  Thank you very much, Duncan.
  All this being said:
 
  What would you say is the most elegant and most safe way to solve such
  a seemingly simple task?
 
  Thank you!
 
  On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch
  murdoch.dun...@gmail.com wrote:
   On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote:
   So, Duncan, do I understand you correctly:
  
   When I use x$x6, R doesn't know if it's TRUE or FALSE, so it returns
   a logical value of NA.
  
   Yes, when x$x is NA.  (Though I think you meant x$c.)
  
   When this logical value is applied to a row, the R says: hell, I 
don't
   know if I should keep it or not, so, just in case, I am going to keep
   it, but I'll replace all the values in this row with NAs?
  
   Yes.  Indexing with a logical NA is probably a mistake, and this is 
one
   way to signal it without actually triggering a warning or error.
  
   BTW, I should have mentioned that the example 

Re: [Rd] [R] Why does R replace all row values with NAs

2015-03-03 Thread Stephanie M. Gogarten



On 3/3/15 1:26 PM, Hervé Pagès wrote:



On 03/03/2015 02:28 AM, Martin Maechler wrote:

Diverted from R-help :
 as it gets into musing about new R language primitives


William Dunlap wdun...@tibco.com
 on Fri, 27 Feb 2015 08:04:36 -0800 writes:


  You could define functions like

  is.true - function(x) !is.na(x)  x
  is.false - function(x) !is.na(x)  !x

  and use them in your selections.  E.g.,
  x - data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10))
  x[is.true(x$c = 6), ]
  a  b  c
  7   7  8  7
  10 10 11 10

  Bill Dunlap
  TIBCO Software
  wdunlap tibco.com

Yes; the Matrix package has had these

is0  - function(x) !is.na(x)  x == 0
isN0 - function(x)  is.na(x) | x != 0
is1  - function(x) !is.na(x)  x   # also == isTRUE componentwise


Note that using %in% to block propagation of NAs is about 2x faster:

  x - sample(c(NA_integer_, 1:1), 50, replace=TRUE)
  microbenchmark(as.logical(x) %in% TRUE, !is.na(x)  x)
Unit: milliseconds
 expr   minlq  mean   medianuq
  as.logical(x) %in% TRUE  6.034744  6.264382  6.999083  6.29488  6.346028
!is.na(x)  x 11.202808 11.402437 11.469101 11.44848 11.517576
   max neval
  40.36472   100
  11.90916   100


Unfortunately %in% does not preserve matrix dimensions:

 x - matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), nrow=50)
 dim(x)
[1] 50 10
 dim(!is.na(x)  x)
[1] 50 10
 dim(as.logical(x) %in% TRUE)
NULL

Stephanie







namespace hidden for a while  [note the comment of the last one!]
and using them for readibility in its own code.

Maybe we should (again) consider providing some versions of
these with R ?

The Matrix package also has had fast

allFalse - all0 - function(x) .Call(R_all0, x)
anyFalse - any0 - function(x) .Call(R_any0, x)
##
## anyFalse - function(x) isTRUE(any(!x)) ## ~= any0
## any0 - function(x) isTRUE(any(x == 0))  ## ~= anyFalse

namespace hidden as well, already, which probably could also be
brought to base R.

One big reason to *not* go there (to internal C code) at all with R is
that
S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group generics)
and 'is.na() have been known and package writers have
programmed methods for these.
To ensure that S3 and S4 dispatch works correctly also inside
such new internals is much less easily achieved, and so
such a C-based internal function  is0() would no longer be
equivalent with!is.na(x)  x == 0
as soon as 'x' is an object with a '==', 'Compare' and/or an is.na()
method.


Excellent point. Thank you! It really makes a big difference for
developers who maintain a complex hierarchy of S4 classes and methods,
when functions like is.true, anyFalse, etc..., which can be expressed in
terms of more basic operations like ==, !=, !, is.na, etc..., just work
out-of-the-box on objects for which these basic operations are defined.

There is conceptually a small set of building blocks, at least for
objects with a vector-like or list-like semantic, that can be used
to formally describe the semantic of many functions in base R. This
is what the man page for anyNA does by saying:

   anyNA implements any(is.na(x))

even though the actual implementation differs, but that's ok, as long
as anyNA is equivalent to doing any(is.na(x)) on any object for which
building block is.na() is implemented.

Unfortunately there is no clearly identified set of building blocks
in base R. For example, if I want the comparison operations to work
on my object, I need to implement ==, , , !=, =, and = (the
'Compare' group generics) even though it should be enough to implement
== and =, because all the others can be described in terms of these
2 building blocks. unique/duplicated is another example (unique(x) is
conceptually x[!duplicated(x)]). And so on...

Cheers,
H.



OTOH, simple R versions such as your  'is.true',  called 'is1'
inside Matrix maybe optimizable a bit by the byte compiler (and
jit and other such tricks) and still keep the full
semantic including correct method dispatch.

Martin Maechler, ETH Zurich


  On Fri, Feb 27, 2015 at 7:27 AM, Dimitri Liakhovitski 
  dimitri.liakhovit...@gmail.com wrote:

  Thank you very much, Duncan.
  All this being said:
 
  What would you say is the most elegant and most safe way to
solve such
  a seemingly simple task?
 
  Thank you!
 
  On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch
  murdoch.dun...@gmail.com wrote:
   On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote:
   So, Duncan, do I understand you correctly:
  
   When I use x$x6, R doesn't know if it's TRUE or FALSE, so
it returns
   a logical value of NA.
  
   Yes, when x$x is NA.  (Though I think you meant x$c.)
  
   When this logical value is applied to a row, the R says:
hell, I don't
   know if I should keep it or not, so, just in case, I am
going to keep
   it, 

Re: [Rd] [R] Why does R replace all row values with NAs

2015-03-03 Thread Gabriel Becker
Stephanie,

Actually, it's as.logical that isn't preserving matrix dimensions, because
it coerces to a logical vector:

 x - matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), nrow=50)
 dim(as.logical(x))
NULL

~G

On Tue, Mar 3, 2015 at 2:09 PM, Stephanie M. Gogarten 
sdmor...@u.washington.edu wrote:



 On 3/3/15 1:26 PM, Hervé Pagès wrote:



 On 03/03/2015 02:28 AM, Martin Maechler wrote:

 Diverted from R-help :
  as it gets into musing about new R language primitives

  William Dunlap wdun...@tibco.com
  on Fri, 27 Feb 2015 08:04:36 -0800 writes:


   You could define functions like

   is.true - function(x) !is.na(x)  x
   is.false - function(x) !is.na(x)  !x

   and use them in your selections.  E.g.,
   x - data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10))
   x[is.true(x$c = 6), ]
   a  b  c
   7   7  8  7
   10 10 11 10

   Bill Dunlap
   TIBCO Software
   wdunlap tibco.com

 Yes; the Matrix package has had these

 is0  - function(x) !is.na(x)  x == 0
 isN0 - function(x)  is.na(x) | x != 0
 is1  - function(x) !is.na(x)  x   # also == isTRUE componentwise


 Note that using %in% to block propagation of NAs is about 2x faster:

   x - sample(c(NA_integer_, 1:1), 50, replace=TRUE)
   microbenchmark(as.logical(x) %in% TRUE, !is.na(x)  x)
 Unit: milliseconds
  expr   minlq  mean   medianuq
   as.logical(x) %in% TRUE  6.034744  6.264382  6.999083  6.29488  6.346028
 !is.na(x)  x 11.202808 11.402437 11.469101 11.44848
 11.517576
max neval
   40.36472 100
   11.90916   100


 Unfortunately %in% does not preserve matrix dimensions:

  x - matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), nrow=50)
  dim(x)
 [1] 50 10
  dim(!is.na(x)  x)
 [1] 50 10
  dim(as.logical(x) %in% TRUE)
 NULL

 Stephanie






 namespace hidden for a while  [note the comment of the last one!]
 and using them for readibility in its own code.

 Maybe we should (again) consider providing some versions of
 these with R ?

 The Matrix package also has had fast

 allFalse - all0 - function(x) .Call(R_all0, x)
 anyFalse - any0 - function(x) .Call(R_any0, x)
 ##
 ## anyFalse - function(x) isTRUE(any(!x)) ## ~= any0
 ## any0 - function(x) isTRUE(any(x == 0))  ## ~= anyFalse

 namespace hidden as well, already, which probably could also be
 brought to base R.

 One big reason to *not* go there (to internal C code) at all with R is
 that
 S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group generics)
 and 'is.na() have been known and package writers have
 programmed methods for these.
 To ensure that S3 and S4 dispatch works correctly also inside
 such new internals is much less easily achieved, and so
 such a C-based internal function  is0() would no longer be
 equivalent with!is.na(x)  x == 0
 as soon as 'x' is an object with a '==', 'Compare' and/or an is.na()
 method.


 Excellent point. Thank you! It really makes a big difference for
 developers who maintain a complex hierarchy of S4 classes and methods,
 when functions like is.true, anyFalse, etc..., which can be expressed in
 terms of more basic operations like ==, !=, !, is.na, etc..., just work
 out-of-the-box on objects for which these basic operations are defined.

 There is conceptually a small set of building blocks, at least for
 objects with a vector-like or list-like semantic, that can be used
 to formally describe the semantic of many functions in base R. This
 is what the man page for anyNA does by saying:

anyNA implements any(is.na(x))

 even though the actual implementation differs, but that's ok, as long
 as anyNA is equivalent to doing any(is.na(x)) on any object for which
 building block is.na() is implemented.

 Unfortunately there is no clearly identified set of building blocks
 in base R. For example, if I want the comparison operations to work
 on my object, I need to implement ==, , , !=, =, and = (the
 'Compare' group generics) even though it should be enough to implement
 == and =, because all the others can be described in terms of these
 2 building blocks. unique/duplicated is another example (unique(x) is
 conceptually x[!duplicated(x)]). And so on...

 Cheers,
 H.


 OTOH, simple R versions such as your  'is.true',  called 'is1'
 inside Matrix maybe optimizable a bit by the byte compiler (and
 jit and other such tricks) and still keep the full
 semantic including correct method dispatch.

 Martin Maechler, ETH Zurich


   On Fri, Feb 27, 2015 at 7:27 AM, Dimitri Liakhovitski 
   dimitri.liakhovit...@gmail.com wrote:

   Thank you very much, Duncan.
   All this being said:
  
   What would you say is the most elegant and most safe way to
 solve such
   a seemingly simple task?
  
   Thank you!
  
   On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch
   murdoch.dun...@gmail.com wrote:
On 27/02/2015 9:49 AM, Dimitri Liakhovitski 

Re: [Rd] [R] Why does R replace all row values with NAs

2015-03-03 Thread Hervé Pagès



On 03/03/2015 02:17 PM, Gabriel Becker wrote:

Stephanie,

Actually, it's as.logical that isn't preserving matrix dimensions,
because it coerces to a logical vector:

  x - matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), nrow=50)
  dim(as.logical(x))


It's true, as.logical() doesn't help here but Stephanie is right, %in%
does not preserve the dimensions either:

 dim(x %in% 1:5)
NULL

That's because match() itself doesn't preserve the dimensions:

 dim(match(x, 1:5))
NULL

So maybe my fast is.true() should be:

is.true - function(x)
{
  ans - as.logical(x) %in% TRUE
  if (is.null(dim(x))) {
names(ans) - names(x)
  } else {
dim(ans) - dim(x)
dimnames(ans) - dimnames(x)
  }
  ans
}

or something like that...

H.


NULL

~G

On Tue, Mar 3, 2015 at 2:09 PM, Stephanie M. Gogarten
sdmor...@u.washington.edu mailto:sdmor...@u.washington.edu wrote:



On 3/3/15 1:26 PM, Hervé Pagès wrote:



On 03/03/2015 02:28 AM, Martin Maechler wrote:

Diverted from R-help :
 as it gets into musing about new R language primitives

William Dunlap wdun...@tibco.com
mailto:wdun...@tibco.com
  on Fri, 27 Feb 2015 08:04:36 -0800
writes:


   You could define functions like

   is.true - function(x) !is.na http://is.na(x)  x
   is.false - function(x) !is.na http://is.na(x)  !x

   and use them in your selections.  E.g.,
   x -
data.frame(a=1:10,b=2:11,c=c(__1,NA,3,NA,5,NA,7,NA,NA,10))
   x[is.true(x$c = 6), ]
   a  b  c
   7   7  8  7
   10 10 11 10

   Bill Dunlap
   TIBCO Software
   wdunlap tibco.com http://tibco.com

Yes; the Matrix package has had these

is0  - function(x) !is.na http://is.na(x)  x == 0
isN0 - function(x) is.na http://is.na(x) | x != 0
is1  - function(x) !is.na http://is.na(x)  x   # also ==
isTRUE componentwise


Note that using %in% to block propagation of NAs is about 2x faster:

   x - sample(c(NA_integer_, 1:1), 50, replace=TRUE)
   microbenchmark(as.logical(x) %in% TRUE, !is.na
http://is.na(x)  x)
Unit: milliseconds
  expr   minlq  mean
  medianuq
   as.logical(x) %in% TRUE  6.034744  6.264382  6.999083
6.29488  6.346028
 !is.na http://is.na(x)  x 11.202808 11.402437
11.469101 11.44848 11.517576
max neval
40.36472 100 tel:40.36472%20%20%20100
   11.90916   100


Unfortunately %in% does not preserve matrix dimensions:

  x - matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE),
nrow=50)
  dim(x)
[1] 50 10
  dim(!is.na http://is.na(x)  x)
[1] 50 10
  dim(as.logical(x) %in% TRUE)
NULL

Stephanie






namespace hidden for a while  [note the comment of the last
one!]
and using them for readibility in its own code.

Maybe we should (again) consider providing some versions of
these with R ?

The Matrix package also has had fast

allFalse - all0 - function(x) .Call(R_all0, x)
anyFalse - any0 - function(x) .Call(R_any0, x)
##
## anyFalse - function(x) isTRUE(any(!x)) ## ~= any0
## any0 - function(x) isTRUE(any(x == 0))  ## ~=
anyFalse

namespace hidden as well, already, which probably could also be
brought to base R.

One big reason to *not* go there (to internal C code) at all
with R is
that
S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group
generics)
and 'is.na http://is.na() have been known and package
writers have
programmed methods for these.
To ensure that S3 and S4 dispatch works correctly also inside
such new internals is much less easily achieved, and so
such a C-based internal function  is0() would no longer be
equivalent with!is.na http://is.na(x)  x == 0
as soon as 'x' is an object with a '==', 'Compare' and/or
an is.na http://is.na()
method.


Excellent point. Thank you! It really makes a big difference for
developers who maintain a complex hierarchy of S4 classes and
methods,
when functions like is.true, anyFalse, etc..., which can be
expressed in
terms of more basic operations like ==, !=, !, is.na
http://is.na, etc..., just work
out-of-the-box on objects for which these basic operations are
 

Re: [Bioc-devel] Changes to the SummarizedExperiment Class

2015-03-03 Thread Peter Haverty



  I still think GRanges should be a subclass of DataFrame,
 which would make this easy, but I don't seem to be winning that argument.


 Just impossible. As Michael mentioned back in November, they have
 conflicting APIs.


Maybe a new GRangesFrame that is a DataFrame and holds a GRanges
(without mcols) as an index?


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Changes to the SummarizedExperiment Class

2015-03-03 Thread Michael Lawrence
Seems like rowData could be made to work universallly through coercion.
rowRanges would not, however, and one would like a convenient mechanism to
condition on whether range information is available. One way is to
introduce a new class and rely on dispatch. But that adds complexity.

On Tue, Mar 3, 2015 at 2:44 PM, Gabe Becker becker.g...@gene.com wrote:

 Jim et al.,

 Why have two accessors (rowRanges, rowData), each of which are less
 flexible than the underlying structure and thus will fail (return NULL? or
 GRanges()/DataFrame() ?) in some proportion of valid objects?

 ~G

 On Tue, Mar 3, 2015 at 2:37 PM, Jim Hester james.f.hes...@gmail.com
 wrote:

  Motivated by the discussion thread from November (https://stat.ethz.ch/
  pipermail/bioc-devel/2014-November/006686.html) the Bioconductor core
 team
  is planning on making changes to the SummarizedExperiment class.  Our end
  goal is to allow the @rowData slot to become more flexible and hold
 either
  a DataFrame or GRanges type object.
 
  To this end we have currently deprecated the current rowData accessor in
  favor of a rowRanges accessor.  This change has resulted in a few broken
  builds in devel, which we are in the process of fixing now.  We will
  contact any package authors directly if needed for this migration.
 
  The rowData accessor will be deprecated in this release, however
 eventually
  the plan is to re-purpose this function to serve as an accessor for
  DataFrame data on the rows.
 
  Please let us know if you have any questions with the above and if you
 need
  any assistance with the transition.
 
  [[alternative HTML version deleted]]
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 



 --
 Gabriel Becker, Ph.D
 Computational Biologist
 Genentech Research

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Changes to the SummarizedExperiment Class

2015-03-03 Thread Tim Triche, Jr.
This.

It would be damned near perfect as a return value for assays coming out of
an object that held several such assays at several time points in a
population, where there are both assay-wise and covariate-wise holes that
could nonetheless be usefully imputed across assays.


Statistics is the grammar of science.
Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science

On Tue, Mar 3, 2015 at 3:25 PM, Peter Haverty haverty.pe...@gene.com
wrote:

 
 
 
   I still think GRanges should be a subclass of DataFrame,
  which would make this easy, but I don't seem to be winning that
 argument.
 
 
  Just impossible. As Michael mentioned back in November, they have
  conflicting APIs.


 Maybe a new GRangesFrame that is a DataFrame and holds a GRanges
 (without mcols) as an index?


 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Changes to the SummarizedExperiment Class

2015-03-03 Thread Gabe Becker
Jim et al.,

Why have two accessors (rowRanges, rowData), each of which are less
flexible than the underlying structure and thus will fail (return NULL? or
GRanges()/DataFrame() ?) in some proportion of valid objects?

~G

On Tue, Mar 3, 2015 at 2:37 PM, Jim Hester james.f.hes...@gmail.com wrote:

 Motivated by the discussion thread from November (https://stat.ethz.ch/
 pipermail/bioc-devel/2014-November/006686.html) the Bioconductor core team
 is planning on making changes to the SummarizedExperiment class.  Our end
 goal is to allow the @rowData slot to become more flexible and hold either
 a DataFrame or GRanges type object.

 To this end we have currently deprecated the current rowData accessor in
 favor of a rowRanges accessor.  This change has resulted in a few broken
 builds in devel, which we are in the process of fixing now.  We will
 contact any package authors directly if needed for this migration.

 The rowData accessor will be deprecated in this release, however eventually
 the plan is to re-purpose this function to serve as an accessor for
 DataFrame data on the rows.

 Please let us know if you have any questions with the above and if you need
 any assistance with the transition.

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




-- 
Gabriel Becker, Ph.D
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Changes to the SummarizedExperiment Class

2015-03-03 Thread Hervé Pagès

On 03/03/2015 03:06 PM, Peter Haverty wrote:

I'd like to see a basic class that takes a DataFrame and a sub-class that
takes a GRanges.


Yes.


I still think GRanges should be a subclass of DataFrame,
which would make this easy, but I don't seem to be winning that argument.


Just impossible. As Michael mentioned back in November, they have
conflicting APIs.



While the hood is up, can we try some different names?
SummarizedExperiment never seemed like a great fit to me because it doesn't
necessarily contain experiments or summaries thereof.  It's a collection of
like-sized rectangular things with metadata on the two dimensions.  Maybe
the name could reflect what it holds rather than a common use case?
AnnotatedMatrixList?


We actually need 2 names: 1 for the parent class, 1 for the child. I'm
starting to think that introducing 2 new names would maybe make the
migration a little bit easier, especially since the plan is to move the
refactored SummarizedExperiment to its own package. With 2 new names
we can start the new package, implement the 2 new classes in it, and
have the old SummarizedExperiment (in GenomicRanges) and the 2 new
classes peacefully cohabit during the time of the migration.

Cheers,
H.



  Anyway, I'm excited to see a version on the way that takes a DataFrame as
rowData.  I'm glad you guys are working on that.

Regards,

Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Tue, Mar 3, 2015 at 2:57 PM, Michael Lawrence lawrence.mich...@gene.com
wrote:


Seems like rowData could be made to work universallly through coercion.
rowRanges would not, however, and one would like a convenient mechanism to
condition on whether range information is available. One way is to
introduce a new class and rely on dispatch. But that adds complexity.

On Tue, Mar 3, 2015 at 2:44 PM, Gabe Becker becker.g...@gene.com wrote:


Jim et al.,

Why have two accessors (rowRanges, rowData), each of which are less
flexible than the underlying structure and thus will fail (return NULL?

or

GRanges()/DataFrame() ?) in some proportion of valid objects?

~G

On Tue, Mar 3, 2015 at 2:37 PM, Jim Hester james.f.hes...@gmail.com
wrote:


Motivated by the discussion thread from November (

https://stat.ethz.ch/

pipermail/bioc-devel/2014-November/006686.html) the Bioconductor core

team

is planning on making changes to the SummarizedExperiment class.  Our

end

goal is to allow the @rowData slot to become more flexible and hold

either

a DataFrame or GRanges type object.

To this end we have currently deprecated the current rowData accessor

in

favor of a rowRanges accessor.  This change has resulted in a few

broken

builds in devel, which we are in the process of fixing now.  We will
contact any package authors directly if needed for this migration.

The rowData accessor will be deprecated in this release, however

eventually

the plan is to re-purpose this function to serve as an accessor for
DataFrame data on the rows.

Please let us know if you have any questions with the above and if you

need

any assistance with the transition.

 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel





--
Gabriel Becker, Ph.D
Computational Biologist
Genentech Research

 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Changes to the SummarizedExperiment Class

2015-03-03 Thread Peter Haverty
There are some nice similarities in these new imaginary types.  A
GRangesFrame is a list of dimensionally identical things (columns) and
some row meta-data (the GRanges).  The SE-like object is similarly a list
of dimensionally like things (matrices, RleDataFrames, BigMatrix objects,
HDF5-backed things) with some row meta-data (a DataFrame or GRangesFrame).
Elegant?  Maybe they would actually be relatives in the class tree.

I wonder if this kind of thing would be easier if we had Java-style
Interfaces or duck-typing.  The x slot of y holds something that
implements this set of methods ...

Oh, and kinda apropos, the genoset class will probably go away or become an
extension to this new SE-like thing.  The extra stuff that comes along with
genoset will still be available.

Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Tue, Mar 3, 2015 at 3:42 PM, Tim Triche, Jr. tim.tri...@gmail.com
wrote:

 This.

 It would be damned near perfect as a return value for assays coming out of
 an object that held several such assays at several time points in a
 population, where there are both assay-wise and covariate-wise holes that
 could nonetheless be usefully imputed across assays.


 Statistics is the grammar of science.
 Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science

 On Tue, Mar 3, 2015 at 3:25 PM, Peter Haverty haverty.pe...@gene.com
 wrote:

  
  
  
I still think GRanges should be a subclass of DataFrame,
   which would make this easy, but I don't seem to be winning that
  argument.
  
  
   Just impossible. As Michael mentioned back in November, they have
   conflicting APIs.
 
 
  Maybe a new GRangesFrame that is a DataFrame and holds a GRanges
  (without mcols) as an index?
 
 
  [[alternative HTML version deleted]]
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Rd] Errors on Windows with grep(fixed=TRUE) on UTF-8 strings

2015-03-03 Thread Winston Chang
After a bit more investigation, I think I've found the cause of the bug,
and I have a patch.

This bug happens with grep(), when:
* Running on Windows.
* The search uses fixed=TRUE.
* The search pattern is a single byte.
* The current locale has a multibyte encoding.

===
Here's an example that demonstrates the bug:

# First, create a 3-byte UTF-8 character
y - rawToChar(as.raw(c(0xe6, 0xb8, 0x97)))
Encoding(y) - UTF-8
y
# [1] 渗

# In my default locale, grep with a single-char pattern and fixed=TRUE
# returns integer(0), as expected.
Sys.getlocale(LC_CTYPE)
# [1] English_United States.1252
grep(a, y, fixed = TRUE)
# integer(0)

# When the using a multibyte locale, grep with a single-char
# pattern and fixed=TRUE results in an error.
Sys.setlocale(LC_CTYPE, chinese)
grep(a, y, fixed = TRUE)
# Error in grep(a, y, fixed = TRUE) : invalid multibyte string at '97'


===

I believe the problem is in the main/grep.c file, in the fgrep_one
function. It tests for a multi-byte character string locale
`mbcslocale`, and then for the `use_UTF8`, like so:

if (!useBytes  mbcslocale) {
...
} else if (!useBytes  use_UTF8) {
...
} else ...

This can be seen at
https://github.com/wch/r-source/blob/e92b4c1cba05762480cd3898335144e5dd111cb7/src/main/grep.c#L668-L692

A similar pattern occurs in the fgrep_one_bytes function, at
https://github.com/wch/r-source/blob/e92b4c1cba05762480cd3898335144e5dd111cb7/src/main/grep.c#L718-L736


I believe that the test order should be reversed; it should test first
for `use_UTF8`, and then for `mbcslocale`. This pattern occurs in a few
places in grep.c. It looks like this:

if (!useBytes  use_UTF8) {
...
} else if (!useBytes  mbcslocale) {
...
} else ...


===
This patch does what I described; it simply tests for `use_UTF8` first,
and then `mbcslocale`, in both fgrep_one and fgrep_one_bytes. I made
this patch against the 3.1.2 sources, and tested the example code above.
In both cases, grep() returned integer(0), as expected.

(The reason I made this change against 3.1.2 is because I had problems
getting the current trunk to compile on both Linux or Windows.)


diff --git src/main/grep.c src/main/grep.c
index 6e6ec3e..348c63d 100644
--- src/main/grep.c
+++ src/main/grep.c
@@ -664,27 +664,27 @@ static int fgrep_one(const char *pat, const char *target,
}
return -1;
 }
-if (!useBytes  mbcslocale) { /* skip along by chars */
-   mbstate_t mb_st;
+if (!useBytes  use_UTF8) {
int ib, used;
-   mbs_init(mb_st);
for (ib = 0, i = 0; ib = len-plen; i++) {
if (strncmp(pat, target+ib, plen) == 0) {
if (next != NULL) *next = ib + plen;
return i;
}
-   used = (int) Mbrtowc(NULL,  target+ib, MB_CUR_MAX, mb_st);
+   used = utf8clen(target[ib]);
if (used = 0) break;
ib += used;
}
-} else if (!useBytes  use_UTF8) {
+} else if (!useBytes  mbcslocale) { /* skip along by chars */
+   mbstate_t mb_st;
int ib, used;
+   mbs_init(mb_st);
for (ib = 0, i = 0; ib = len-plen; i++) {
if (strncmp(pat, target+ib, plen) == 0) {
if (next != NULL) *next = ib + plen;
return i;
}
-   used = utf8clen(target[ib]);
+   used = (int) Mbrtowc(NULL,  target+ib, MB_CUR_MAX, mb_st);
if (used = 0) break;
ib += used;
}
@@ -714,21 +714,21 @@ static int fgrep_one_bytes(const char *pat, const char 
*target, int len,
if (*p == pat[0]) return i;
return -1;
 }
-if (!useBytes  mbcslocale) { /* skip along by chars */
-   mbstate_t mb_st;
+if (!useBytes  use_UTF8) { /* not really needed */
int ib, used;
-   mbs_init(mb_st);
for (ib = 0, i = 0; ib = len-plen; i++) {
if (strncmp(pat, target+ib, plen) == 0) return ib;
-   used = (int) Mbrtowc(NULL, target+ib, MB_CUR_MAX, mb_st);
+   used = utf8clen(target[ib]);
if (used = 0) break;
ib += used;
}
-} else if (!useBytes  use_UTF8) { /* not really needed */
+} else if (!useBytes  mbcslocale) { /* skip along by chars */
+   mbstate_t mb_st;
int ib, used;
+   mbs_init(mb_st);
for (ib = 0, i = 0; ib = len-plen; i++) {
if (strncmp(pat, target+ib, plen) == 0) return ib;
-   used = utf8clen(target[ib]);
+   used = (int) Mbrtowc(NULL, target+ib, MB_CUR_MAX, mb_st);
if (used = 0) break;
ib += used;
}


-Winston

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] Why does R replace all row values with NAs

2015-03-03 Thread Martin Maechler
Diverted from R-help :
 as it gets into musing about new R language primitives

 William Dunlap wdun...@tibco.com
 on Fri, 27 Feb 2015 08:04:36 -0800 writes:

 You could define functions like

 is.true - function(x) !is.na(x)  x
 is.false - function(x) !is.na(x)  !x

 and use them in your selections.  E.g.,
 x - data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10))
 x[is.true(x$c = 6), ]
 a  b  c
 7   7  8  7
 10 10 11 10

 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com

Yes; the Matrix package has had these

is0  - function(x) !is.na(x)  x == 0
isN0 - function(x)  is.na(x) | x != 0
is1  - function(x) !is.na(x)  x   # also == isTRUE componentwise

namespace hidden for a while  [note the comment of the last one!]
and using them for readibility in its own code.

Maybe we should (again) consider providing some versions of
these with R ?

The Matrix package also has had fast 

allFalse - all0 - function(x) .Call(R_all0, x)
anyFalse - any0 - function(x) .Call(R_any0, x)
## 
## anyFalse - function(x) isTRUE(any(!x))   ## ~= any0
## any0 - function(x) isTRUE(any(x == 0))## ~= anyFalse

namespace hidden as well, already, which probably could also be
brought to base R.

One big reason to *not* go there (to internal C code) at all with R is that
S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group generics) 
and 'is.na() have been known and package writers have
programmed methods for these.
To ensure that S3 and S4 dispatch works correctly also inside
such new internals is much less easily achieved, and so
such a C-based internal function  is0() would no longer be
equivalent with!is.na(x)  x == 0
as soon as 'x' is an object with a '==', 'Compare' and/or an is.na() method.

OTOH, simple R versions such as your  'is.true',  called 'is1'
inside Matrix maybe optimizable a bit by the byte compiler (and
jit and other such tricks) and still keep the full
semantic including correct method dispatch.

Martin Maechler, ETH Zurich


 On Fri, Feb 27, 2015 at 7:27 AM, Dimitri Liakhovitski 
 dimitri.liakhovit...@gmail.com wrote:

 Thank you very much, Duncan.
 All this being said:
 
 What would you say is the most elegant and most safe way to solve such
 a seemingly simple task?
 
 Thank you!
 
 On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch
 murdoch.dun...@gmail.com wrote:
  On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote:
  So, Duncan, do I understand you correctly:
 
  When I use x$x6, R doesn't know if it's TRUE or FALSE, so it returns
  a logical value of NA.
 
  Yes, when x$x is NA.  (Though I think you meant x$c.)
 
  When this logical value is applied to a row, the R says: hell, I don't
  know if I should keep it or not, so, just in case, I am going to keep
  it, but I'll replace all the values in this row with NAs?
 
  Yes.  Indexing with a logical NA is probably a mistake, and this is one
  way to signal it without actually triggering a warning or error.
 
  BTW, I should have mentioned that the example where you indexed using
  -which(x$c=6) is a bad idea:  if none of the entries were 6 or more,
  this would be indexing with an empty vector, and you'd get nothing, not
  everything.
 
  Duncan Murdoch
 
 
 
  On Fri, Feb 27, 2015 at 9:13 AM, Duncan Murdoch
  murdoch.dun...@gmail.com wrote:
  On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote:
  I know how to get the output I need, but I would benefit from an
  explanation why R behaves the way it does.
 
  # I have a data frame x:
  x = data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10))
  x
  # I want to toss rows in x that contain values =6. But I don't want
  to toss my NAs there.
 
  subset(x,c6) # Works correctly, but removes NAs in c, understand 
why
  x[which(x$c6),] # Works correctly, but removes NAs in c, understand
 why
  x[-which(x$c=6),] # output I need
 
  # Here is my question: why does the following line replace the 
values
  of all rows that contain an NA # in x$c with NAs?
 
  x[x$c6,]  # Leaves rows with c=NA, but makes the whole row an NA.
 Why???
  x[(x$c6) | is.na(x$c),] # output I need - I have to be
 super-explicit
 
  Thank you very much!
 
  Most of your examples (except the ones using which()) are doing 
logical
  indexing.  In logical indexing, TRUE keeps a line, FALSE drops the
 line,
  and NA returns NA.  Since x$c  6 is NA if x$c is NA, you get the
  third kind of indexing.
 
  Your last example works because in the cases where x$c is NA, it
  evaluates NA | TRUE, and that evaluates to TRUE.  In the cases where
 x$c
  is not NA, you get x$c  6 | FALSE, and that's the same as x$c  6,
  which will be either TRUE or FALSE.
 
  Duncan Murdoch

[Rd] Asking for tasks of summer code 2015

2015-03-03 Thread han cao
Hey everyone:
I am a Master student from Saarland Unirversity, Germany with the major of
Bioinformatics. And I am interested in statistical learning which is also
my major work in the future with the implementation by R. So I 'd like join
the google summer code this year by doing tasks in your community. However
I can not find whether there are tasks available provided for this year,
anyone can tell me?


Hank Cao

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Asssistance

2015-03-03 Thread Evans Otieno Ochiaga
Hi to All,

I am building a package in R and whenever I run command R CMD build OAR
in the terminal, I get the following error:

* checking for file ‘OAR/DESCRIPTION’ ... OK
* preparing ‘OAR’:
* checking DESCRIPTION meta-information ... ERROR
Malformed Depends or Suggests or Imports or Enhances field.
Offending entries:
  R (=3.0.2)
Entries must be names of packages optionally followed by '=' or '=',
white space, and a valid version number in parentheses.

See the information on DESCRIPTION files in section 'Creating R
packages' of the 'Writing R Extensions' manual.

This is my first time to build a package using R and it's very hard for me
to figure out where the problem is. I kindly call for your assistance in
fixing the problem. Below is my function;

bcidata - read.csv(~/Desktop/Files_for_Package/data.csv); bcidata

Modelsfunc- function(bcidata){

  occupancymean.data.frame - NULL

  for (k in seq(2.5,250,by=2.5)){

i - 1000/k

j - 500/k

bcidata$Xgrid - cut(bcidata$PX, breaks = i, include.lowest = T)

bcidata$Ygrid - cut(bcidata$PY, breaks = j, include.lowest = T)

bcidata$IDgrid - with(bcidata, interaction(Xgrid,Ygrid))

bcidata$IDNgrid - factor(bcidata$IDgrid)

levels(bcidata$IDgrid) - seq_along(levels(bcidata$IDgrid))

bcidata$count - ave(bcidata$PX, bcidata$IDgrid, FUN = length)

aggregate - aggregate(bcidata$PX,bcidata[,c(Xgrid,Ygrid,IDNgrid)],
FUN = length)

Totalgrids - length(levels(bcidata$IDgrid))

Occupiedgrids - length(aggregate$IDNgrid)

sum - sum(aggregate$x)

TotalArea - 50

Area - (1000/i*500/j)

Occupancy - (Occupiedgrids/Totalgrids)

Mean - length(bcidata$Latin)/(Occupiedgrids)

Variance - var(aggregate$x)

occupancymean.data.frame - rbind(occupancymean.data.frame,
data.frame(Area, Totalgrids, Occupiedgrids, Occupancy, Mean, Variance))

  }

  occupancymean.data.frame

  Occupancy - occupancymean.data.frame$Occupancy

  Mean - occupancymean.data.frame$Mean

  poission - nls(Occupancy ~ 1-exp(-rho*Mean), start = list(rho = 2.1),
data = occupancymean.data.frame)

  nachman - nls(Occupancy ~ 1-exp(-alpha*Mean^beta), start = list(alpha =
0.2, beta = 0.1), data = occupancymean.data.frame)

  logistic - nls(Occupancy ~ (alpha*Mean^beta)/(1+alpha*Mean^beta), start
= list(alpha = 0.2, beta = 0.1),data = occupancymean.data.frame)

  nbd - nls(Occupancy ~ 1-(1+(Mean)/k)^-k, start = list(k = 1), data =
occupancymean.data.frame)

  power - nls(Occupancy ~ alpha*Mean^beta, start = list(alpha = 0.2, beta=
0.1), data = occupancymean.data.frame)

  inbd - nls(Occupancy ~
1-(alpha*(Mean^(beta-1)))^(Mean/(1-alpha*Mean^(beta-1))), start =
list(alpha = 0.2, beta = 0.3),

  data = occupancymean.data.frame)

  fnbd - nls(Occupancy ~ 1- (gamma(N +
k/(Mean*A/N)-k)*gamma(k/(Mean*A/N)))/(gamma(k/(Mean*A/N)-k)*gamma(N+k/(Mean*A/N))),


  start = list(k = 0.2, A = 0.1, N = 0.2), data =  occupancymean
.data.frame)

  bayesianII - nls(Occupancy ~ 1-(theta*beta^(2*(TotalArea
*Mean/sum)^0.5)*delta^(TotalArea*Mean/sum)), start = list(theta=0.9956,
beta=1, delta=1), data = occupancymean.data.frame)


  return(list(summary(poission), summary(nachman), summary(logistic),
summary(nbd),

  summary(power), summary(inbd), summary(fnbd), summary(
bayesianII)))

}

Modelsfunc(bcidata)

Your assistance will be highly appreciated. Thanks in advance.

Regards,


*Evans Ochiaga*

*African Institute for Mathematical Sciences*

*6 Melrose Road*

*Muizenberg, South Africa*

*Msc in Mathematical Sciences+27 84 61 69 183 *

*When I cannot understand my Father’s leading, And it seems to be but hard
and cruel fate, Still I hear that gentle whisper ever pleading, God is
working, God is faithful—Only wait.*

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Asssistance

2015-03-03 Thread Duncan Murdoch
On 03/03/2015 5:47 AM, Evans Otieno Ochiaga wrote:
 Hi to All,
 
 I am building a package in R and whenever I run command R CMD build OAR
 in the terminal, I get the following error:
 
 * checking for file ‘OAR/DESCRIPTION’ ... OK
 * preparing ‘OAR’:
 * checking DESCRIPTION meta-information ... ERROR
 Malformed Depends or Suggests or Imports or Enhances field.
 Offending entries:
   R (=3.0.2)
 Entries must be names of packages optionally followed by '=' or '=',
 white space, and a valid version number in parentheses.

That looks okay; I'm guessing it is out of place.  Can you show use your
DESCRIPTION file?

Duncan Murdoch

 
 See the information on DESCRIPTION files in section 'Creating R
 packages' of the 'Writing R Extensions' manual.
 
 This is my first time to build a package using R and it's very hard for me
 to figure out where the problem is. I kindly call for your assistance in
 fixing the problem. Below is my function;
 
 bcidata - read.csv(~/Desktop/Files_for_Package/data.csv); bcidata
 
 Modelsfunc- function(bcidata){
 
   occupancymean.data.frame - NULL
 
   for (k in seq(2.5,250,by=2.5)){
 
 i - 1000/k
 
 j - 500/k
 
 bcidata$Xgrid - cut(bcidata$PX, breaks = i, include.lowest = T)
 
 bcidata$Ygrid - cut(bcidata$PY, breaks = j, include.lowest = T)
 
 bcidata$IDgrid - with(bcidata, interaction(Xgrid,Ygrid))
 
 bcidata$IDNgrid - factor(bcidata$IDgrid)
 
 levels(bcidata$IDgrid) - seq_along(levels(bcidata$IDgrid))
 
 bcidata$count - ave(bcidata$PX, bcidata$IDgrid, FUN = length)
 
 aggregate - aggregate(bcidata$PX,bcidata[,c(Xgrid,Ygrid,IDNgrid)],
 FUN = length)
 
 Totalgrids - length(levels(bcidata$IDgrid))
 
 Occupiedgrids - length(aggregate$IDNgrid)
 
 sum - sum(aggregate$x)
 
 TotalArea - 50
 
 Area - (1000/i*500/j)
 
 Occupancy - (Occupiedgrids/Totalgrids)
 
 Mean - length(bcidata$Latin)/(Occupiedgrids)
 
 Variance - var(aggregate$x)
 
 occupancymean.data.frame - rbind(occupancymean.data.frame,
 data.frame(Area, Totalgrids, Occupiedgrids, Occupancy, Mean, Variance))
 
   }
 
   occupancymean.data.frame
 
   Occupancy - occupancymean.data.frame$Occupancy
 
   Mean - occupancymean.data.frame$Mean
 
   poission - nls(Occupancy ~ 1-exp(-rho*Mean), start = list(rho = 2.1),
 data = occupancymean.data.frame)
 
   nachman - nls(Occupancy ~ 1-exp(-alpha*Mean^beta), start = list(alpha =
 0.2, beta = 0.1), data = occupancymean.data.frame)
 
   logistic - nls(Occupancy ~ (alpha*Mean^beta)/(1+alpha*Mean^beta), start
 = list(alpha = 0.2, beta = 0.1),data = occupancymean.data.frame)
 
   nbd - nls(Occupancy ~ 1-(1+(Mean)/k)^-k, start = list(k = 1), data =
 occupancymean.data.frame)
 
   power - nls(Occupancy ~ alpha*Mean^beta, start = list(alpha = 0.2, beta=
 0.1), data = occupancymean.data.frame)
 
   inbd - nls(Occupancy ~
 1-(alpha*(Mean^(beta-1)))^(Mean/(1-alpha*Mean^(beta-1))), start =
 list(alpha = 0.2, beta = 0.3),
 
   data = occupancymean.data.frame)
 
   fnbd - nls(Occupancy ~ 1- (gamma(N +
 k/(Mean*A/N)-k)*gamma(k/(Mean*A/N)))/(gamma(k/(Mean*A/N)-k)*gamma(N+k/(Mean*A/N))),
 
 
   start = list(k = 0.2, A = 0.1, N = 0.2), data =  occupancymean
 .data.frame)
 
   bayesianII - nls(Occupancy ~ 1-(theta*beta^(2*(TotalArea
 *Mean/sum)^0.5)*delta^(TotalArea*Mean/sum)), start = list(theta=0.9956,
 beta=1, delta=1), data = occupancymean.data.frame)
 
 
   return(list(summary(poission), summary(nachman), summary(logistic),
 summary(nbd),
 
   summary(power), summary(inbd), summary(fnbd), summary(
 bayesianII)))
 
 }
 
 Modelsfunc(bcidata)
 
 Your assistance will be highly appreciated. Thanks in advance.
 
 Regards,
 
 
 *Evans Ochiaga*
 
 *African Institute for Mathematical Sciences*
 
 *6 Melrose Road*
 
 *Muizenberg, South Africa*
 
 *Msc in Mathematical Sciences+27 84 61 69 183 *
 
 *When I cannot understand my Father’s leading, And it seems to be but hard
 and cruel fate, Still I hear that gentle whisper ever pleading, God is
 working, God is faithful—Only wait.*
 
   [[alternative HTML version deleted]]
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Asssistance

2015-03-03 Thread Gregor Kastner
Hi Evans,

 * checking for file ‘OAR/DESCRIPTION’ ... OK
 * preparing ‘OAR’:
 * checking DESCRIPTION meta-information ... ERROR
 Malformed Depends or Suggests or Imports or Enhances field.
 Offending entries:
   R (=3.0.2)
 Entries must be names of packages optionally followed by '=' or '=',
 white space, and a valid version number in parentheses.

The _white space_ (see explanation above) seems to be missing.

Try R (= 3.0.2)

Best,
/g

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel