Dear Herve, >>>>> Hervé Pagès <hpa...@fhcrc.org> >>>>> on Tue, 23 Apr 2013 23:09:21 -0700 writes:
> Hi, In the man page for is.unsorted(): > Value: > A length-one logical value. All objects of length 0 > or 1 are sorted: the result will be ‘NA’ for objects of > length 2 or more except for atomic vectors and objects > with a class (where the ‘>=’ or ‘>’ method is used to > compare ‘x[i]’ with ‘x[i-1]’ for ‘i’ in ‘2:length(x)’). > This contains many incorrect statements: >> length(NA) > [1] 1 >> is.unsorted(NA) > [1] NA >> length(list(NA)) > [1] 1 >> is.unsorted(list(NA)) > [1] NA > => Contradicts "all objects of length 0 or 1 are sorted". >> is.unsorted(raw(2)) > Error in is.unsorted(raw(2)) : unimplemented type > 'raw' in 'isUnsorted' > => Doesn't agree with the doc (unless "except for atomic > vectors" means "it might fail for atomic vectors"). >> setClass("A", representation(aa="integer")) a <- new("A", >> aa=4:1) length(a) > [1] 1 >> is.unsorted(a) > [1] FALSE Warning message: In is.na(x) : is.na() > applied to non-(list or vector) of type 'S4' > => Ok, but it's arguable the warning is useful/justified > from a user point of view. The warning *seems* to suggest > that defining an "is.na" method for my objects is required > for is.unsorted() to work properly but the doc doesn't > make this clear. > Anyway, let's define one, so the warning goes away: >> setMethod("is.na", "A", function(x) is.na(x@aa)) > [1] "is.na" > Let's define a "length" method: >> setMethod("length", "A", function(x) length(x@aa)) > [1] "length" >> length(a) > [1] 4 >> is.unsorted(a) > [1] FALSE > => Is this correct? Hard to know. The doc is not clear > about what should happen for objects of length 2 or more > and with a class but with no ">=" or ">" methods. > Let's define "[", ">=", and ">": >> setMethod("[", "A", function(x, i, j, ..., drop=TRUE) >> new("A", > aa=x@aa[i])) [1] "[" >> rev(a) > An object of class "A" Slot "aa": [1] 1 2 3 4 >> setMethod(">=", c("A", "A"), function(e1, e2) {e1@aa >= >> e2@aa}) > [1] ">=" >> a >= a[3] > [1] TRUE TRUE TRUE FALSE >> setMethod(">", c("A", "A"), function(e1, e2) {e1@aa > >> e2@aa}) > [1] ">" >> a > a[3] > [1] TRUE TRUE FALSE FALSE >> is.unsorted(a) > [1] FALSE >> is.unsorted(rev(a)) > [1] FALSE > Still not working as expected. So what's required exactly > for making is.unsorted() work on an object "with a class"? well, read the source code. :-) ;-) More seriously: On another hidden help page, you find \code{.gt} and \code{.gtn} are callbacks from \code{\link{rank}} and \code{\link{is.unsorted}} used for classed objects. In other words, you'd need do define a method for .gtn for S4 objects in this case. .... yes, indeed I don't know why this is not at all documented. > BTW, is.unsorted() would be *much* faster, at least on > atomic vectors, without those calls to is.na(). Well, in all R versions, apart from R-devel as of yesterday, the source of is.unsorted() has been is.unsorted <- function(x, na.rm = FALSE, strictly = FALSE) { if(is.null(x)) return(FALSE) if(!na.rm && any(is.na(x)))## "FIXME" is.na(<large>) is "too slow" return(NA) ## else if(na.rm && any(ii <- is.na(x))) x <- x[!ii] .Internal(is.unsorted(x, strictly)) } so you see the "FIXME". In R-devel (and probably R-patched in the nearer future), that line is if(!na.rm && anyMissing(x)) so there's no slow code anymore, at least not for the default case of na.rm = FALSE. > The C code > could check for NAs, without having to do this as a first > pass on the full vector like it is the case with the > current implementation. If the vector if unsorted, the C > code is typically able to bail out early so the speed-up > will typically be 10000x or more if the vector as millions > of elements. you are right (but again: the most important case na.rm=FALSE case has been "solved" already I'd say), but you know well that we do gratefully accept good patches to the R sources. > Thanks, H. >> sessionInfo() > R version 3.0.0 (2013-04-03) Platform: > x86_64-unknown-linux-gnu (64-bit) > locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] > LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] > LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] > LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] > LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > attached base packages: [1] stats graphics grDevices utils > datasets methods base > loaded via a namespace (and not attached): [1] tools_3.0.0 > -- > Hervé Pagès > Program in Computational Biology Division of Public Health > Sciences Fred Hutchinson Cancer Research Center 1100 > Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA > 98109-1024 > E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax: (206) > 667-1319 > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel