Stephanie, Actually, it's as.logical that isn't preserving matrix dimensions, because it coerces to a logical vector:
> x <- matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), nrow=50) > dim(as.logical(x)) NULL ~G On Tue, Mar 3, 2015 at 2:09 PM, Stephanie M. Gogarten < sdmor...@u.washington.edu> wrote: > > > On 3/3/15 1:26 PM, Hervé Pagès wrote: > >> >> >> On 03/03/2015 02:28 AM, Martin Maechler wrote: >> >>> Diverted from R-help : >>> .... as it gets into musing about new R language "primitives" >>> >>> William Dunlap <wdun...@tibco.com> >>>>>>>> on Fri, 27 Feb 2015 08:04:36 -0800 writes: >>>>>>>> >>>>>>> >>> > You could define functions like >>> >>> > is.true <- function(x) !is.na(x) & x >>> > is.false <- function(x) !is.na(x) & !x >>> >>> > and use them in your selections. E.g., >>> >> x <- data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10)) >>> >> x[is.true(x$c >= 6), ] >>> > a b c >>> > 7 7 8 7 >>> > 10 10 11 10 >>> >>> > Bill Dunlap >>> > TIBCO Software >>> > wdunlap tibco.com >>> >>> Yes; the Matrix package has had these >>> >>> is0 <- function(x) !is.na(x) & x == 0 >>> isN0 <- function(x) is.na(x) | x != 0 >>> is1 <- function(x) !is.na(x) & x # also == "isTRUE componentwise" >>> >> >> Note that using %in% to block propagation of NAs is about 2x faster: >> >> > x <- sample(c(NA_integer_, 1:10000), 500000, replace=TRUE) >> > microbenchmark(as.logical(x) %in% TRUE, !is.na(x) & x) >> Unit: milliseconds >> expr min lq mean median uq >> as.logical(x) %in% TRUE 6.034744 6.264382 6.999083 6.29488 6.346028 >> !is.na(x) & x 11.202808 11.402437 11.469101 11.44848 >> 11.517576 >> max neval >> 40.36472 100 >> 11.90916 100 >> > > Unfortunately %in% does not preserve matrix dimensions: > > > x <- matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), nrow=50) > > dim(x) > [1] 50 10 > > dim(!is.na(x) & x) > [1] 50 10 > > dim(as.logical(x) %in% TRUE) > NULL > > Stephanie > > > >> >> >> >>> namespace hidden for a while [note the comment of the last one!] >>> and using them for readibility in its own code. >>> >>> Maybe we should (again) consider providing some versions of >>> these with R ? >>> >>> The Matrix package also has had fast >>> >>> allFalse <- all0 <- function(x) .Call(R_all0, x) >>> anyFalse <- any0 <- function(x) .Call(R_any0, x) >>> ## >>> ## anyFalse <- function(x) isTRUE(any(!x)) ## ~= any0 >>> ## any0 <- function(x) isTRUE(any(x == 0)) ## ~= anyFalse >>> >>> namespace hidden as well, already, which probably could also be >>> brought to base R. >>> >>> One big reason to *not* go there (to internal C code) at all with R is >>> that >>> S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group generics) >>> and 'is.na() have been known and package writers have >>> programmed methods for these. >>> To ensure that S3 and S4 dispatch works "correctly" also inside >>> such new internals is much less easily achieved, and so >>> such a C-based internal function is0() would no longer be >>> equivalent with !is.na(x) & x == 0 >>> as soon as 'x' is an "object" with a '==', 'Compare' and/or an is.na() >>> method. >>> >> >> Excellent point. Thank you! It really makes a big difference for >> developers who maintain a complex hierarchy of S4 classes and methods, >> when functions like is.true, anyFalse, etc..., which can be expressed in >> terms of more basic operations like ==, !=, !, is.na, etc..., just work >> out-of-the-box on objects for which these basic operations are defined. >> >> There is conceptually a small set of "building blocks", at least for >> objects with a vector-like or list-like semantic, that can be used >> to formally describe the semantic of many functions in base R. This >> is what the man page for anyNA does by saying: >> >> anyNA implements any(is.na(x)) >> >> even though the actual implementation differs, but that's ok, as long >> as anyNA is equivalent to doing any(is.na(x)) on any object for which >> building block is.na() is implemented. >> >> Unfortunately there is no clearly identified set of building blocks >> in base R. For example, if I want the comparison operations to work >> on my object, I need to implement ==, >, <, !=, <=, and >= (the >> 'Compare' group generics) even though it should be enough to implement >> == and >=, because all the others can be described in terms of these >> 2 building blocks. unique/duplicated is another example (unique(x) is >> conceptually x[!duplicated(x)]). And so on... >> >> Cheers, >> H. >> >> >>> OTOH, simple R versions such as your 'is.true', called 'is1' >>> inside Matrix maybe optimizable a bit by the byte compiler (and >>> jit and other such tricks) and still keep the full >>> semantic including correct method dispatch. >>> >>> Martin Maechler, ETH Zurich >>> >>> >>> > On Fri, Feb 27, 2015 at 7:27 AM, Dimitri Liakhovitski < >>> > dimitri.liakhovit...@gmail.com> wrote: >>> >>> >> Thank you very much, Duncan. >>> >> All this being said: >>> >> >>> >> What would you say is the most elegant and most safe way to >>> solve such >>> >> a seemingly simple task? >>> >> >>> >> Thank you! >>> >> >>> >> On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch >>> >> <murdoch.dun...@gmail.com> wrote: >>> >> > On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote: >>> >> >> So, Duncan, do I understand you correctly: >>> >> >> >>> >> >> When I use x$x<6, R doesn't know if it's TRUE or FALSE, so >>> it returns >>> >> >> a logical value of NA. >>> >> > >>> >> > Yes, when x$x is NA. (Though I think you meant x$c.) >>> >> > >>> >> >> When this logical value is applied to a row, the R says: >>> hell, I don't >>> >> >> know if I should keep it or not, so, just in case, I am >>> going to keep >>> >> >> it, but I'll replace all the values in this row with NAs? >>> >> > >>> >> > Yes. Indexing with a logical NA is probably a mistake, and >>> this is one >>> >> > way to signal it without actually triggering a warning or >>> error. >>> >> > >>> >> > BTW, I should have mentioned that the example where you >>> indexed using >>> >> > -which(x$c>=6) is a bad idea: if none of the entries were 6 >>> or more, >>> >> > this would be indexing with an empty vector, and you'd get >>> nothing, not >>> >> > everything. >>> >> > >>> >> > Duncan Murdoch >>> >> > >>> >> > >>> >> >> >>> >> >> On Fri, Feb 27, 2015 at 9:13 AM, Duncan Murdoch >>> >> >> <murdoch.dun...@gmail.com> wrote: >>> >> >>> On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote: >>> >> >>>> I know how to get the output I need, but I would benefit >>> from an >>> >> >>>> explanation why R behaves the way it does. >>> >> >>>> >>> >> >>>> # I have a data frame x: >>> >> >>>> x = data.frame(a=1:10,b=2:11,c=c( >>> 1,NA,3,NA,5,NA,7,NA,NA,10)) >>> >> >>>> x >>> >> >>>> # I want to toss rows in x that contain values >=6. But I >>> don't want >>> >> >>>> to toss my NAs there. >>> >> >>>> >>> >> >>>> subset(x,c<6) # Works correctly, but removes NAs in c, >>> understand why >>> >> >>>> x[which(x$c<6),] # Works correctly, but removes NAs in c, >>> understand >>> >> why >>> >> >>>> x[-which(x$c>=6),] # output I need >>> >> >>>> >>> >> >>>> # Here is my question: why does the following line >>> replace the values >>> >> >>>> of all rows that contain an NA # in x$c with NAs? >>> >> >>>> >>> >> >>>> x[x$c<6,] # Leaves rows with c=NA, but makes the whole >>> row an NA. >>> >> Why??? >>> >> >>>> x[(x$c<6) | is.na(x$c),] # output I need - I have to be >>> >> super-explicit >>> >> >>>> >>> >> >>>> Thank you very much! >>> >> >>> >>> >> >>> Most of your examples (except the ones using which()) are >>> doing logical >>> >> >>> indexing. In logical indexing, TRUE keeps a line, FALSE >>> drops the >>> >> line, >>> >> >>> and NA returns NA. Since "x$c < 6" is NA if x$c is NA, >>> you get the >>> >> >>> third kind of indexing. >>> >> >>> >>> >> >>> Your last example works because in the cases where x$c is >>> NA, it >>> >> >>> evaluates NA | TRUE, and that evaluates to TRUE. In the >>> cases where >>> >> x$c >>> >> >>> is not NA, you get x$c < 6 | FALSE, and that's the same as >>> x$c < 6, >>> >> >>> which will be either TRUE or FALSE. >>> >> >>> >>> >> >>> Duncan Murdoch >>> >> >>> >>> >> >> >>> >> >> >>> >> >> >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> Dimitri Liakhovitski >>> >> >>> >> ______________________________________________ >>> >> r-h...@r-project.org mailing list -- To UNSUBSCRIBE and more, >>> see >>> >> https://stat.ethz.ch/mailman/listinfo/r-help >>> >> PLEASE do read the posting guide >>> >> http://www.R-project.org/posting-guide.html >>> >> and provide commented, minimal, self-contained, reproducible >>> code. >>> >> >>> >>> > [[alternative HTML version deleted]] >>> >>> > ______________________________________________ >>> > r-h...@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> > https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> > and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >>> >> > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Gabriel Becker, PhD Computational Biologist Bioinformatics and Computational Biology Genentech, Inc. [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel