Providing the wrapper would allow for both performance as well as user-simplicity.
x[RANGE(1,1e6)] and x[1:1e6] could both be handled internally, where: RANGE <- function(from,to) { structure(seq(from,to), class="RANGE") } Just testing for a 'RANGE' object in your [. method would let the optimization be up to the end user. The 'xts' package provides something similar with respect to subsetting by time. We accept a character string conforming to ISO8601 style time ranges, as well as standard classes that would be available to subset any other matrix-like object. The ISO way will get you fast binary searching over the time-index, whereas using POSIX time is a linear search. HTH Jeff On Wed, May 12, 2010 at 3:27 PM, James Bullard <bull...@stat.berkeley.edu>wrote: > >> -----Original Message----- > >> From: r-devel-boun...@r-project.org > >> [mailto:r-devel-boun...@r-project.org] On Behalf Of Duncan Murdoch > >> Sent: Wednesday, May 12, 2010 11:35 AM > >> To: bull...@stat.berkeley.edu > >> Cc: r-de...@stat.math.ethz.ch > >> Subject: Re: [Rd] ranges and contiguity checking > >> > >> On 12/05/2010 2:18 PM, James Bullard wrote: > >> > Hi All, > >> > > >> > I am interfacing to some C libraries (hdf5) and I have > >> methods defined for > >> > '[', these methods do hyperslab selection, however, currently I am > >> > limiting slab selection to contiguous blocks, i.e., things > >> defined like: > >> > i:(i+k). I don't do any contiguity checking at this point, > >> I just grab the > >> > max and min of the range and them potentially do an > >> in-memory subselection > >> > which is what I am definitely trying to avoid. Besides > >> using deparse, I > >> > can't see anyway to figure out that these things (i:(i+k) > >> and c(i, i+1, > >> > ..., i+k)) are different. > >> > > >> > I have always liked how 1:10 was a valid expression in R > >> (as opposed to > >> > python where it is not by itself.), however I'd somehow > >> like to know that > >> > the thing was contiguous range without examining the un-evaluated > >> > expression or worse, all(diff(i:(i+k)) == 1) > > > > You could define a sequence class, say 'hfcSeq' > > and insist that the indices given to [.hfc are > > hfcSeq objects. E.g., instead of > > hcf[i:(i+k)] > > the user would use > > hcf[hfcSeq(i,i+k)] > > or > > index <- hcfSeq(i,i+k) > > hcf[index] > > max, min, and range methods for hcfSeq > > would just inspect one or both of its > > elements. > > I could do this, but I wanted it to not matter to the user whether or not > they were dealing with a HDF5Dataset or a plain-old matrix. > > It seems like I cannot define methods on: ':'. If I could do that then I > could implement an immutable 'range' class which would be good, but then > I'd have to also implement: '['(matrix, range) -- which would be easy, but > still more work than I wanted to do. > > I guess I was thinking that there is some inherent value in an immutable > native range type which is constant in time and memory for construction. > Then I could define methods on '['(matrix, range) and '['(matrix, > integer). I'm pretty confident this is more less what is happening in the > IRanges package in Bioconductor, but (maybe for the lack of support for > setting methods on ':') it is happening in a way that makes things very > non-transparent to a user. As it stands, I can optimize for performance by > using a IRange-type wrapper or I can optimize for code-clarity by killing > performance. > > thanks again, jim > > > > > > > > > Bill Dunlap > > Spotfire, TIBCO Software > > wdunlap tibco.com > > > >> > >> You can implement all(diff(x) == 1) more efficiently in C, > >> but I don't > >> see how you could hope to do any better than that without > >> putting very > >> un-R-like restrictions on your code. Do you really want to say that > >> > >> A[i:(i+k)] > >> > >> is legal, but > >> > >> x <- i:(i+k) > >> A[x] > >> > >> is not? That will be very confusing for your users. The problem is > >> that objects don't remember where they came from, only arguments to > >> functions do, and functions that make use of this fact mainly > >> do it for > >> decorating the output (nice labels in plots) or making error messages > >> more intelligible. > >> > >> Duncan Murdoch > >> > >> ______________________________________________ > >> R-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> > > > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Jeffrey Ryan jeffrey.r...@insightalgo.com ia: insight algorithmics www.insightalgo.com [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel