Thanks again! BTW, how did you find the code for "[.default"? I tried: > get("[.default") Error in get(x, envir, mode, inherits) : variable "[.default" was not found
> -----Original Message----- > From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] > Sent: Monday, May 09, 2005 9:46 PM > To: Vadim Ogranovich > Cc: r-devel@stat.math.ethz.ch; [EMAIL PROTECTED] > Subject: Re: [Rd] Light-weight data.frame class: was: how to > add method to .Primitive function > > "[.default" is implemented in R as .subset. See ?.subset and > note that it begins with a dot. e.g. for the case where i > and j are not missing: > > "[.lwdf" <- function(x, i, j) lapply(.subset(x,j), "[", i) > > > > On 5/8/05, Vadim Ogranovich <[EMAIL PROTECTED]> wrote: > > Hi, > > > > Encouraged by a tip from Simon Urbanek I tried to use the > S3 machinery > > to write a faster version of the data.frame class. > > This quickly hits a snag: the "[.default"(x, i) for some > reason cares > > about the dimensionality of x. > > In the end there is a full transcript of my R session. It > includes the > > motivation for writing the class and the problems I have > encountered. > > > > As a result I see three issues here: > > * why "[.default"(x, i) doesn't work if dim(x) is 2? After all a > > single subscript into a vector works regardless of whether it's a > > matrix or not. Is there an alternative way to access "[.default"? > > * why does unclass() make deep copy? This is a facet of the global > > over-conservatism of R with respect to copying. > > * is it possible to add some sort copy profiling to R? > Something like > > copyProfiling(TRUE), which should cause R to log sizes of > each copied > > object (just raw sizes w/o any attempt to identify the > object). This > > feature should at least help assess the magnitude of the problem. > > > > Thanks, > > Vadim > > > > Now the transcript itself: > > > # the motivation: subscription of a data.frame is *much* > (almost 20 > > times) slower than that of a list > > > # compare > > > n = 1e6 > > > i = seq(n) > > > > > > x = data.frame(a=seq(n), b=seq(n)) > > > system.time(x[i,], gcFirst=TRUE) > > [1] 1.01 0.14 1.14 0.00 0.00 > > > > > > x = list(a=seq(n), b=seq(n)) > > > system.time(lapply(x, function(col) col[i]), gcFirst=TRUE) > > [1] 0.06 0.00 0.06 0.00 0.00 > > > > > > > > > # the solution: define methods for the light-weight > data.frame class > > > lwdf = function(...) structure(list(...), class = "lwdf") > > > > > > # dim > > > dim.lwdf = function(x) c(length(x[[1]]), length(x)) > > > > > > # for pretty printing we define print.lwdf via a conversion to > > data.frame > > > # as.data.frame.lwdf > > > as.data.frame.lwdf = function(x) structure(unclass(x), > > class="data.frame", row.names=as.character(seq(nrow(x)))) > > > > > > # print > > > print.lwdf = function(x) print.data.frame(as.data.frame.lwdf(x)) > > > > > > # now the real stuff > > > > > > # "[" > > > # the naive "[.lwdf" = function (x, i, j) lapply(x[j], > function(col) > > col[i]) > > > # won't work because evaluation of x[j] calls "[.lwdf" > again and not > > "[.default" > > > # so we switch by the number of arguments "[.lwdf" = > function (x, i, > > > j) { > > + if (nargs() == 2) > > + NextMethod("[", x, i) > > + else > > + structure(lapply(x[j], function(col) col[i]), class > = "lwdf") } > > > > > > x = lwdf(a=seq(3), b=letters[seq(3)], > c=as.factor(letters[seq(3)])) > > > i = c(1,3); j = c(1,3) > > > > > > # unfortunately, for some reasons "[.default" cares about > > dimensionality of its argument > > > x[i,j] > > Error in "[.default"(x, j) : incorrect number of dimensions > > > > > > > > > # we could use unclass to get it right "[.lwdf" = > function (x, i, j) > > > { > > + structure(lapply(unclass(x)[j], function(col) col[i]), class = > > "lwdf") > > + } > > > > > > x[i,j] > > a c > > 1 1 a > > 2 3 c > > > > > > # *but* unclass creates a deep copy of its argument as indirectly > > evidenced by the following timing > > > x = lwdf(a=seq(1e6)); system.time(unclass(x)) > > [1] 0.01 0.00 0.01 0.00 0.00 > > > x = lwdf(a=seq(1e8)); system.time(unclass(x)) > > [1] 0.44 0.39 0.82 0.00 0.00 > > > > > version > > _ > > platform x86_64-unknown-linux-gnu > > arch x86_64 > > os linux-gnu > > system x86_64, linux-gnu > > status > > major 2 > > minor 0.1 > > year 2004 > > month 11 > > day 15 > > language R > > > > ______________________________________________ > > R-devel@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > ______________________________________________ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel