[Rd] ranges and contiguity checking

2010-05-12 Thread James Bullard
Hi All,

I am interfacing to some C libraries (hdf5) and I have methods defined for
'[', these methods do hyperslab selection, however, currently I am
limiting slab selection to contiguous blocks, i.e., things defined like:
i:(i+k). I don't do any contiguity checking at this point, I just grab the
max and min of the range and them potentially do an in-memory subselection
which is what I am definitely trying to avoid. Besides using deparse, I
can't see anyway to figure out that these things (i:(i+k) and c(i, i+1,
..., i+k)) are different.

I have always liked how 1:10 was a valid expression in R (as opposed to
python where it is not by itself.), however I'd somehow like to know that
the thing was contiguous range without examining the un-evaluated
expression or worse, all(diff(i:(i+k)) == 1)

thanks, jim

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ranges and contiguity checking

2010-05-12 Thread Duncan Murdoch

On 12/05/2010 2:18 PM, James Bullard wrote:

Hi All,

I am interfacing to some C libraries (hdf5) and I have methods defined for
'[', these methods do hyperslab selection, however, currently I am
limiting slab selection to contiguous blocks, i.e., things defined like:
i:(i+k). I don't do any contiguity checking at this point, I just grab the
max and min of the range and them potentially do an in-memory subselection
which is what I am definitely trying to avoid. Besides using deparse, I
can't see anyway to figure out that these things (i:(i+k) and c(i, i+1,
..., i+k)) are different.

I have always liked how 1:10 was a valid expression in R (as opposed to
python where it is not by itself.), however I'd somehow like to know that
the thing was contiguous range without examining the un-evaluated
expression or worse, all(diff(i:(i+k)) == 1)


You can implement all(diff(x) == 1) more efficiently in C, but I don't 
see how you could hope to do any better than that without putting very 
un-R-like restrictions on your code.  Do you really want to say that


A[i:(i+k)]

is legal, but

x <- i:(i+k)
A[x]

is not?  That will be very confusing for your users.  The problem is 
that objects don't remember where they came from, only arguments to 
functions do, and functions that make use of this fact mainly do it for 
decorating the output (nice labels in plots) or making error messages 
more intelligible. 


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ranges and contiguity checking

2010-05-12 Thread William Dunlap
> -Original Message-
> From: r-devel-boun...@r-project.org 
> [mailto:r-devel-boun...@r-project.org] On Behalf Of Duncan Murdoch
> Sent: Wednesday, May 12, 2010 11:35 AM
> To: bull...@stat.berkeley.edu
> Cc: r-de...@stat.math.ethz.ch
> Subject: Re: [Rd] ranges and contiguity checking
> 
> On 12/05/2010 2:18 PM, James Bullard wrote:
> > Hi All,
> >
> > I am interfacing to some C libraries (hdf5) and I have 
> methods defined for
> > '[', these methods do hyperslab selection, however, currently I am
> > limiting slab selection to contiguous blocks, i.e., things 
> defined like:
> > i:(i+k). I don't do any contiguity checking at this point, 
> I just grab the
> > max and min of the range and them potentially do an 
> in-memory subselection
> > which is what I am definitely trying to avoid. Besides 
> using deparse, I
> > can't see anyway to figure out that these things (i:(i+k) 
> and c(i, i+1,
> > ..., i+k)) are different.
> >
> > I have always liked how 1:10 was a valid expression in R 
> (as opposed to
> > python where it is not by itself.), however I'd somehow 
> like to know that
> > the thing was contiguous range without examining the un-evaluated
> > expression or worse, all(diff(i:(i+k)) == 1)

You could define a sequence class, say 'hfcSeq'
and insist that the indices given to [.hfc are
hfcSeq objects.  E.g., instead of
hcf[i:(i+k)]
the user would use
hcf[hfcSeq(i,i+k)]
or
index <- hcfSeq(i,i+k)
hcf[index]
max, min, and range methods for hcfSeq
would just inspect one or both of its
elements.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> 
> You can implement all(diff(x) == 1) more efficiently in C, 
> but I don't 
> see how you could hope to do any better than that without 
> putting very 
> un-R-like restrictions on your code.  Do you really want to say that
> 
> A[i:(i+k)]
> 
> is legal, but
> 
> x <- i:(i+k)
> A[x]
> 
> is not?  That will be very confusing for your users.  The problem is 
> that objects don't remember where they came from, only arguments to 
> functions do, and functions that make use of this fact mainly 
> do it for 
> decorating the output (nice labels in plots) or making error messages 
> more intelligible. 
> 
> Duncan Murdoch
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ranges and contiguity checking

2010-05-12 Thread James Bullard
>> -Original Message-
>> From: r-devel-boun...@r-project.org
>> [mailto:r-devel-boun...@r-project.org] On Behalf Of Duncan Murdoch
>> Sent: Wednesday, May 12, 2010 11:35 AM
>> To: bull...@stat.berkeley.edu
>> Cc: r-de...@stat.math.ethz.ch
>> Subject: Re: [Rd] ranges and contiguity checking
>>
>> On 12/05/2010 2:18 PM, James Bullard wrote:
>> > Hi All,
>> >
>> > I am interfacing to some C libraries (hdf5) and I have
>> methods defined for
>> > '[', these methods do hyperslab selection, however, currently I am
>> > limiting slab selection to contiguous blocks, i.e., things
>> defined like:
>> > i:(i+k). I don't do any contiguity checking at this point,
>> I just grab the
>> > max and min of the range and them potentially do an
>> in-memory subselection
>> > which is what I am definitely trying to avoid. Besides
>> using deparse, I
>> > can't see anyway to figure out that these things (i:(i+k)
>> and c(i, i+1,
>> > ..., i+k)) are different.
>> >
>> > I have always liked how 1:10 was a valid expression in R
>> (as opposed to
>> > python where it is not by itself.), however I'd somehow
>> like to know that
>> > the thing was contiguous range without examining the un-evaluated
>> > expression or worse, all(diff(i:(i+k)) == 1)
>
> You could define a sequence class, say 'hfcSeq'
> and insist that the indices given to [.hfc are
> hfcSeq objects.  E.g., instead of
> hcf[i:(i+k)]
> the user would use
> hcf[hfcSeq(i,i+k)]
> or
> index <- hcfSeq(i,i+k)
> hcf[index]
> max, min, and range methods for hcfSeq
> would just inspect one or both of its
> elements.

I could do this, but I wanted it to not matter to the user whether or not
they were dealing with a HDF5Dataset or a plain-old matrix.

It seems like I cannot define methods on: ':'. If I could do that then I
could implement an immutable 'range' class which would be good, but then
I'd have to also implement: '['(matrix, range) -- which would be easy, but
still more work than I wanted to do.

I guess I was thinking that there is some inherent value in an immutable
native range type which is constant in time and memory for construction.
Then I could define methods on '['(matrix, range) and '['(matrix,
integer). I'm pretty confident this is more less what is happening in the
IRanges package in Bioconductor, but (maybe for the lack of support for
setting methods on ':') it is happening in a way that makes things very
non-transparent to a user. As it stands, I can optimize for performance by
using a IRange-type wrapper or I can optimize for code-clarity by killing
performance.

thanks again, jim





>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>>
>> You can implement all(diff(x) == 1) more efficiently in C,
>> but I don't
>> see how you could hope to do any better than that without
>> putting very
>> un-R-like restrictions on your code.  Do you really want to say that
>>
>> A[i:(i+k)]
>>
>> is legal, but
>>
>> x <- i:(i+k)
>> A[x]
>>
>> is not?  That will be very confusing for your users.  The problem is
>> that objects don't remember where they came from, only arguments to
>> functions do, and functions that make use of this fact mainly
>> do it for
>> decorating the output (nice labels in plots) or making error messages
>> more intelligible.
>>
>> Duncan Murdoch
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ranges and contiguity checking

2010-05-12 Thread Jeff Ryan
Providing the wrapper would allow for both performance as well as
user-simplicity.

x[RANGE(1,1e6)] and x[1:1e6] could both be handled internally, where:

RANGE <- function(from,to) {
  structure(seq(from,to), class="RANGE")
}

Just testing for a 'RANGE' object in your [. method would let the
optimization be up to the end user.

The 'xts' package provides something similar with respect to subsetting by
time.  We accept a character string conforming to ISO8601 style time ranges,
as well as standard classes that would be available to subset any other
matrix-like object.

The ISO way will get you fast binary searching over the time-index, whereas
using POSIX time is a linear search.

HTH
Jeff

On Wed, May 12, 2010 at 3:27 PM, James Bullard wrote:

> >> -Original Message-
> >> From: r-devel-boun...@r-project.org
> >> [mailto:r-devel-boun...@r-project.org] On Behalf Of Duncan Murdoch
> >> Sent: Wednesday, May 12, 2010 11:35 AM
> >> To: bull...@stat.berkeley.edu
> >> Cc: r-de...@stat.math.ethz.ch
> >> Subject: Re: [Rd] ranges and contiguity checking
> >>
> >> On 12/05/2010 2:18 PM, James Bullard wrote:
> >> > Hi All,
> >> >
> >> > I am interfacing to some C libraries (hdf5) and I have
> >> methods defined for
> >> > '[', these methods do hyperslab selection, however, currently I am
> >> > limiting slab selection to contiguous blocks, i.e., things
> >> defined like:
> >> > i:(i+k). I don't do any contiguity checking at this point,
> >> I just grab the
> >> > max and min of the range and them potentially do an
> >> in-memory subselection
> >> > which is what I am definitely trying to avoid. Besides
> >> using deparse, I
> >> > can't see anyway to figure out that these things (i:(i+k)
> >> and c(i, i+1,
> >> > ..., i+k)) are different.
> >> >
> >> > I have always liked how 1:10 was a valid expression in R
> >> (as opposed to
> >> > python where it is not by itself.), however I'd somehow
> >> like to know that
> >> > the thing was contiguous range without examining the un-evaluated
> >> > expression or worse, all(diff(i:(i+k)) == 1)
> >
> > You could define a sequence class, say 'hfcSeq'
> > and insist that the indices given to [.hfc are
> > hfcSeq objects.  E.g., instead of
> > hcf[i:(i+k)]
> > the user would use
> > hcf[hfcSeq(i,i+k)]
> > or
> > index <- hcfSeq(i,i+k)
> > hcf[index]
> > max, min, and range methods for hcfSeq
> > would just inspect one or both of its
> > elements.
>
> I could do this, but I wanted it to not matter to the user whether or not
> they were dealing with a HDF5Dataset or a plain-old matrix.
>
> It seems like I cannot define methods on: ':'. If I could do that then I
> could implement an immutable 'range' class which would be good, but then
> I'd have to also implement: '['(matrix, range) -- which would be easy, but
> still more work than I wanted to do.
>
> I guess I was thinking that there is some inherent value in an immutable
> native range type which is constant in time and memory for construction.
> Then I could define methods on '['(matrix, range) and '['(matrix,
> integer). I'm pretty confident this is more less what is happening in the
> IRanges package in Bioconductor, but (maybe for the lack of support for
> setting methods on ':') it is happening in a way that makes things very
> non-transparent to a user. As it stands, I can optimize for performance by
> using a IRange-type wrapper or I can optimize for code-clarity by killing
> performance.
>
> thanks again, jim
>
>
>
>
>
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
> >>
> >> You can implement all(diff(x) == 1) more efficiently in C,
> >> but I don't
> >> see how you could hope to do any better than that without
> >> putting very
> >> un-R-like restrictions on your code.  Do you really want to say that
> >>
> >> A[i:(i+k)]
> >>
> >> is legal, but
> >>
> >> x <- i:(i+k)
> >> A[x]
> >>
> >> is not?  That will be very confusing for your users.  The problem is
> >> that objects don't remember where they came from, only arguments to
> >> functions do, and functions that make use of this fact mainly
> >> do it for
> >> decorating the output (nice labels in plots) or making error messages
> >> more intelligible.
> >>
> >> Duncan Murdoch
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Jeffrey Ryan
jeffrey.r...@insightalgo.com

ia: insight algorithmics
www.insightalgo.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel