Chris, On Jan 7, 2013, at 6:23 PM, Chris Jewell wrote:
> Hi All, > > I'm currently trying to write an S4 class that mimics a data.frame, but > stores data on disc in HDF5 format. The idea is that the dataset is likely > to be too large to fit into a standard desktop machine, and by using > subscripts, the user may load bits of the dataset at a time. eg: > >> myLargeData <- LargeData("/path/to/file") >> mySubSet <- myLargeData[1:10, seq(1,15,by=3)] > > I've therefore defined by LargeData class thus > >> LargeData <- setClass("LargeData", representation(filename="character")) >> setMethod("initialize","LargeData", function(.Object,filename) >> .Object@filename <- filename) > > I've then defined the "[" method to call a C++ function (Rcpp), opening the > HDF5 file, and returning the required rows/cols as a data.frame. > > However, what if the user wants to load the entire dataset into memory? > Which method do I overload to achieve the following? > >> fullData <- myLargeData >> class(fullData) > [1] "data.frame" > That makes no sense since a <- b is not a transformation, "a" will have the same value as "b" by definition - and thus the same class. If you really meant fullData <- as.data.frame(myLargerData) then you just need to implement the as.data.frame() method for your class. Note, however, that a more common way to convert between a big data reference and native format in its entirety is simply myLargeData[] -- you may want to have a look at the (many) existing big data packages (AFAIR bigmemory uses C++ back-end as well). Also note that indexing is tricky in R and easy to get wrong (remember: negative indices, index by name etc.) > or apply transformations: > >> myEigen <- eigen(myLargeData) > > In C++ I would normally overload the "double" or "float" operator to achieve > this -- can I do the same thing in R? > Again, there is no implicit coercion in R (you cannot declare variable type in advance) so it doesn't make sense in the context you have in mind from C++ -- in R the equivalent is simply implementing as.double() method, but I suspect that's not what you had in mind. For generics you can simply implement a method for your class (that does the coercion, for example, or uses a more efficient way). If you cannot define a generic or don't want to write your own methods then it's a problem, because the only theoretical way is to subclass numeric vector class, but that is not possible in R if you want to change the representation because it falls through to the more efficient internal code too quickly (without extra dispatch) for you. Cheers. Simon > Thanks, > > Chris > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel