.subset and .subset2 are equivalent to [ and [[ except that dispatch does not take place. See ?.subset
On 11/8/06, Vladimir Dergachev <[EMAIL PROTECTED]> wrote: > On Wednesday 08 November 2006 3:21 am, Prof Brian Ripley wrote: > > > > > So far I was not able to figure out why this is necessary - > > > could anyone help ? > > > > You need to remove the class to avoid recursion: a few lines later x[i] > > needs to be a call to the primitive and not the data frame method. > > I see. Is there a way to get at the primitive directly, i.e. something like > `[.list`(x, i) ? > > > > > > The reason I am looking at it is that changing attributes forces > > > duplication of the data frame and this is the largest cause of slowness > > > of data.frames in general. > > > > Do you have evidence of that? R has facilities to profile its code, and I > > have never seen [.data.frame taking a significant proportion of the total > > time. If it does for your application, consider if a data frame is an > > appropriate way to store your data. I am not sure we would accept that > > data frames do have 'slowness in general', but their generality does make > > them slower than alternatives where the generality is not needed. > > Evidence: > > # this can be copy'n'pasted directly into an R session > # small N - both system calls return small, but comparable running > times > N<-100000 > A<-data.frame(X=1:N, Y=rnorm(N), Z=as.character(rnorm(N))) > system.time(B<-A[,1]) > system.time(B<-A[1,1]) > > > #larger N - both times are larger and still comparable > N<-1000000 > A<-data.frame(X=1:N, Y=rnorm(N), Z=as.character(rnorm(N))) > system.time(B<-A[,1]) > system.time(B<-A[1,1]) > > The running times would also grow with the number of columns. Also I have > modified 2.4.0 version of R to print out large allocations and I get the > impression that the data frame is being duplicated. Same happens for > `[<-.data.frame` - but this function has much more complex code, I have not > looked through it yet. > > Of course, getting a small portion (i.e. A[1:5,]) also takes a lot of time - > but the examples showed above should be O(1). > > My data is a result of data base query - it has naturally columns of different > types and the columns are named (no row.names though) - which is why I used > data.frames. What would you suggest ? > > thank you very much ! > > Vladimir Dergachev > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel